<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Baby Steps]]></title>
  <link href="http://smallcultfollowing.com/babysteps/atom.xml" rel="self"/>
  <link href="http://smallcultfollowing.com/babysteps/"/>
  <updated>2012-02-18T10:05:37-08:00</updated>
  <id>http://smallcultfollowing.com/babysteps/</id>
  <author>
    <name><![CDATA[Nicholas D. Matsakis]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Versioning considered OK]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/18/versioning-considered-ok/"/>
    <updated>2012-02-18T09:59:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/18/versioning-considered-ok</id>
    <content type="html"><![CDATA[<p>Marijn pointed out to me that our current setup should avoid the worst
of the versioning problems I was afraid of.  In the snapshot, we
package up a copy of the compiler along with its associated libraries,
and use this compiler to produce the new compiler.  The new compiler
can then compilers its own target libraries, thus avoiding the need to
interact with libraries produced by the snapshot.</p>

<p>Of course, I should have known this, since I have relied on this so
that I can changed the metadata format without worrying about
backwards compatibility.  That&#8217;s what I get for writing blog posts
late at night.</p>

<p>Anyhow, the good news is that we are able to serialize and deserialize
AST trees faithfully, and I have written (but not tested) the code to
serialize the side tables.  I am now working on the deserialization
code and the pass which will instantiate sources to be inlined.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[CCI and versioning]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/17/cci-and-versioning/"/>
    <updated>2012-02-17T21:40:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/17/cci-and-versioning</id>
    <content type="html"><![CDATA[<p>I&#8217;ve been busily implementing the Cross-Crate Inlining stuff, but one
area I haven&#8217;t looked at much is versioning.  In particular, if we are
going to be serializing the AST, we need a plan for what to do when
the AST changes.  Actually, if inlining were only to be used for
performance, we wouldn&#8217;t really <em>need</em> to have a plan: we could just
not inline when the AST appeared to be stored in some form we don&#8217;t
understand. However, if we fully monomorphize, we will not have that
luxury: without type descriptors, the only way to compile cross-crate,
generic calls will be by inlining.</p>

<p>This because particularly important because Rust is self-hosting.  In
particular, the compilation process begins by compiling the standard
libraries for use by later stages.  But if we change the form of the
AST, the snapshot compiler that bootstraps our compilation will still
be generating the older AST&#8212;so we had better have a way of reading
it!</p>

<p>I am not really sure what&#8217;s the best way to handle this.  I had always
assumed that one we reach 1.0, we would just keep a version of that
AST module around forever, and convert to the newer AST formats.  This
is a somewhat painful but acceptable price to pay, so long as the set
of versions is not too high.  But this scheme looks less attractive if
we have to do it for every field that we add to the AST.</p>

<p>In addition, there is another wrinkle I hadn&#8217;t really thought about:
alongside the AST, we also store the results of various analyses which
are used during code gen.  For example, there is an analysis that
indicates whether a variable is mutated, or whether a particular copy
can in fact be implemented with a move.  If new analyses are added in
the future (and they will be), we won&#8217;t have results available for
older crates, so we will have to be sure we can always get by without
those results.  In most cases, though, these results are just used to
generate faster code, so we can always generate less efficient code
without a problem.  But it is something that we nonetheless have to be
aware of&#8212;and it affects how the side table information is stored.
For example, keeping a set of variables that we can optimize better is
good, but keeping a set of variables for which we must be conservative
is bad.  This is because if the set leads to optimization, we can
always just use an empty set without affecting correctness.  But
anyhow this can all be handled with some code.</p>

<p>Anyway, what would be nicest is to have attributes into the AST to
indicate what kind of values should be provided for fields that are
missing and so forth.  This would mean that the serialization code
would have to get somewhat smarter, so that it can cope with things
like a record with fields that may or may not be present.  This is
where having automated the serialization process should really pay
off, though, since I can make these adjustments once and have all the
code be automatically adjusted.  Still, I&#8217;d have to figure out how to
best encode things so that I can figure out what data <em>is</em> present and
what is not, and what kinds of changes we should accept.</p>

<blockquote><p>One note: these kinds of &#8220;default-providing&#8221; attributes can be dropped
once a new snapshot is generated, except in cases where they are
required for backwards compatibility to some publicly supported
release.</p></blockquote>

<p>I had rather hoped to avoid these kinds of questions, at least not
yet.  These seem like detailed questions that are the domain of a
specialized library.  But I think that they will be hard to avoid so
long as we are bootstrapping, as we will always have to deal with
executables generated based on the older AST definition.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Returning refs]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/16/returning-refs/"/>
    <updated>2012-02-16T08:06:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/16/returning-refs</id>
    <content type="html"><![CDATA[<p>One commonly requested feature for regions is the ability to return
references to the inside of structures.  I did not allow that in the
proposal in <a href="blog/2012/02/15/regions-lite-dot-dot-dot-ish/">my previous post</a> because I did not want to have any
region annotations beyond a simple <code>&amp;</code>.  I think, however, that if you
want to allow returning references to the interior of a parameter, you
need a way for the user to denote region names explicitly.</p>

<p>The big problem with returning references to the interior of data
structures is ensuring the lifetime and validity of that reference.  I
think it can be supported in some cases but not particularly well in
general.  It will work ok for structures allocated on the stack but be
quite limited for structures allocated in the heap, unless we have
strong support from the garbage collector.  Another scenario where it
could work well would be user-managed memory pools (which we probably
do want to support eventually).</p>

<h3>Simplest case</h3>

<p>Let&#8217;s start with the simplest case:</p>

<pre><code>type T = {mut f: uint};
fn get_f(t: r&amp;T) -&gt; r&amp;mut uint { &amp;t.f }
</code></pre>

<p>Here we have a function that returns a pointer to a field of its
parameter.  On the callee side, I think this is fairly
straightforward.  You name the region in which the parameter is
located (<code>r</code>) and say that the return value is a mutable pointer to
<code>uint</code> in the same region (<code>r&amp;mut uint)</code>.  The notation may leave
something to be desired, but the concept is hopefully clear enough.</p>

<h4>Returning references to the stack</h4>

<p>But what about the caller?  Again we&#8217;ll start with the simplest case,
where <code>get_f()</code> is invoked with data that lives on the stack:</p>

<pre><code>fn caller_on_stack() {   // let region `b` refer to the function body
    let t = &amp;{mut f: 3}; // type of t: `b&amp;T`
    let p = get_f(t);    // type of p: `b&amp;mut uint`
    *p = 5;              // legal, `b` region is in scope.
}
</code></pre>

<p>In this case, there is no concern about memory management per se.  The
returned pointer is in the region <code>b</code>.  The type checker will enforce
the rule that if a function returns a reference type, the reference
must be in scope (note: I haven&#8217;t thought through what this means for
generic types with the <code>ref</code> kind, but I guess they can be handled one
way or another).  Generally, because the parameter regions must be in
scope for the call, the returned region will be in scope too&#8212;but
we&#8217;ll see that this does not always hold.</p>

<h4>Returning references to the heap, take 1</h4>

<p>Let&#8217;s move on to a more complicated case where the data being accessed
is in the heap.  I&#8217;ll first discuss how it could work in some
theoretical world and then show how this can cause problems:</p>

<pre><code>fn caller_on_heap() {    // (note: not fully sound)
    let t = @{mut f: 3}; // type of t: `@T`
    let p = get_f(t);    // type of p: `@mut uint`
    *p = 5;              // legal, `@` region is in scope.
}
</code></pre>

<p>The only difference is that the variable <code>t</code> is in the heap region
<code>@</code>.  Now the returned pointer is considered to be in that region as
well, and so the assignment is permitted.  This seems reasonable at
first, but it is of course incompatible with ref counting (<code>p</code> is not
a ref-counted entity). If we moved to a garbage collector which could
handle interior pointers, this would be reasonably safe.  Interior
pointer support is needed to accomodate cases where <code>p</code> gets returned,
like this one:</p>

<pre><code>fn caller_on_heap_r() -&gt; @mut uint {
    let t = @{mut f: 3};
    ret get_f(t);
}
</code></pre>

<p>So is there anything we can do that is compatible with ref. counting?
The answer is &#8220;sort of&#8221;.</p>

<h4>Returning references to the heap, take 2</h4>

<p>One thought is that we say that <code>@</code> is not actually a region.  It&#8217;s
never been the best fit, due to ref counting requirements and implicit
headers.  Instead, we say that regions always refer to some
block-scoped slice of the program execution.  The most common case
would be a block in the program, but in some cases the region might be
&#8220;the time in which a given expression is evaluated&#8221; and so forth (as
an aside: this is basically what I called an &#8220;interval&#8221; in my
thesis&#8230;minus all the parallel parts).</p>

<p>An <code>@T</code> pointer could then be implicitly coerced into an <code>r&amp;T</code> pointer
where the region <code>r</code> is the biggest region for which the type checker
can guarantee the validity of the <code>@T</code> pointer.  So, we can now
revisit the previous examples.  The first example works more-or-less
the same as before:</p>

<pre><code>fn caller_on_heap() {    // region of the block is `r`
    let t = @{mut f: 3}; // type of t: `@T`
    let p = get_f(t);    // type of p: `r&amp;mut uint`
    *p = 5;              // legal, `r` region is in scope.
}
</code></pre>

<p>The differences here lie in the type checker.  The pointer <code>p</code> is no
longer in the <code>@</code> region but rather in the region <code>r</code> corresponding to
the function body.  The reason that the region <code>r</code> could be safely
used is that <code>t</code> is an immutable local variable (I am assuming
<a href="https://github.com/mozilla/rust/issues/1273">issue #1273</a> is implemented&#8230;working on <em>that</em> right now).
This means that the memory will remain valid as long as <code>t</code> is in
scope.</p>

<p><strong>EDIT: There is no reason to impose the following restriction.  See
discussion below.</strong> This implies that if <code>t</code> were mutable, the example
would not work In that case, the validity of the memory to <code>t</code> could
not be guaranteed for the entire block, as <code>t</code> could be overwritten.
In other words, a program might do something like this:</p>

<pre><code>fn caller_on_heap() {
    let mut t = @{mut f: 3};
    let p = get_f(t);
    t = @{mut f: 22};    // original memory is now freed
    *p = 5;              // memory error.
}
</code></pre>

<p>However, such a program would not type check.  The reason is that,
because <code>t</code> is mutable, when <code>t</code> was coerced to a region type, a
narrow region <code>s</code> would be assigned.  The region <code>s</code> would correspond
to precisely the call to <code>get_f()</code>.  The result of <code>get_f()</code> would
therefore have type <code>s&amp;mut uint</code>, but the region <code>s</code> would be out of
scope after <code>get_f()</code> returned, and so a type error occurs (this is
that rule I mentioned before: when returning a reference, the region
must be in scope).</p>

<p>As an aside, coercing a unique pointer <code>~T</code> into a region would work
similarly to the second case: that is, the resulting region is always
a very narrow one.  It does not matter if the variable storing the
unique pointer is immutable or not.  The reason is that the local
variable is being borrowed for the lifetime of the region.  If we
assigned a large region, the local variable would be inaccessible
after the call, because we would not be able to guarantee the
uniqueness invariant, as there might be escaped region-typed pointers
into its interior. Anyway, I don&#8217;t want to go into details about
unique pointers in this post as it&#8217;s already plenty long.</p>

<p><strong>EDIT:</strong> pcwalton pointed out to me that there is no reason to treat
mutable variables specially.  Instead, we can basically just increment
the ref count whenever we coerce an <code>@T</code> to a <code>r&amp;T</code>. The region <code>r</code>
would still be the region of an enclosing block <code>b</code> (probably the
innermost one, or perhaps the one where the variable is declared) and
we would release the reference upon exiting the block <code>b</code> can still
optimize immutable variables to not increase the reference at all
because it is unnecessary.  I rejected this approach initially because
I was thinking that we would want to keep it very predictable when
references would be dropped, but that&#8217;s not actually an important
property.  Garbage collection traditionally does not define precisely
when dead memory will be reclaimed, after all (and, as graydon
correctly points out, RC+CC is garbage collection).  Note though that
borrowing unique pointers probably still ought to use a narrow region
corresponding to the call or <code>alt</code> statement in which the borrow
occurs.</p>

<h3>But does it scale?</h3>

<p>So, I think this system I showed above is reasonable.  I think it has
a clear story, too, which is important to me, because it helps me
believe that it is sound even though I haven&#8217;t made any kind of formal
proof or argument.  The story is basically that</p>

<ul>
<li>a region is always a block-scoped slice of the dynamic execution;</li>
<li>when a <code>@T</code> is coerced into a region pointer, the result is the largest
region for which the validity of the <code>@T</code> pointer can be guaranteed;</li>
<li>when a <code>~T</code> is coerced into a region, the result is a narrow region
corresponding to just the duration of the borrow (I haven&#8217;t gone into
details on this&#8230; perhaps in a later post)</li>
</ul>


<p>However, in all of these examples I showed only the simplest case,
where a pointer was returned directly into one of the parameters.  A
more realistic scenario is probably returning some interior pointer to
a record reachable from a parameter.  For example, a common C trick is
to have a hashtable lookup not return the value which was found but
rather a pointer to the value.  This allows the caller to update the
value without having to use any further API calls.  Let&#8217;s look at that
example: we will find that the regions don&#8217;t scale up.</p>

<p>The prototype for such a function might be:</p>

<pre><code>fn get_value_ptr&lt;K,V&gt;(m: r&amp;map&lt;K,V&gt;, k: &amp;K) -&gt; option&lt;r&amp;V&gt; {
    ...
}
</code></pre>

<p>This looks reasonable, but once we start digging into the details
things don&#8217;t work so well.  Let&#8217;s assume a hashmap with chains for
each key:</p>

<pre><code>enum bucket&lt;K,V&gt; = {k: K, mut v: V, mut next: option&lt;@bucket&lt;K,V&gt;&gt;};
enum map&lt;K,V&gt; = {buckets: [mut option&lt;@bucket&lt;K,V&gt;&gt;], ...};

fn get_value_ptr&lt;K,V&gt;(m: r&amp;map&lt;K,V&gt;, k: &amp;K) -&gt; option&lt;r&amp;mut V&gt; {
    let bkt_idx: uint = find_bucket_index(m, k);
    alt search_bucket_chain(m.buckets[bkt_idx], k) {
        none { none }     // no bucket with key k
        some(bkt) {       // found a bucket with key k
                          // bkt has type @bucket&lt;K,V&gt;
            some(&amp;bkt.v)  // (*) error
        }
    }
}
</code></pre>

<p>The example should be straightforward if you&#8217;ve ever coded up a
hashtable.  The tricky part is marked with a <code>(*)</code>: once we&#8217;ve found
the bucket containing the value, we attempt to return a pointer to its
interior.  But this is not safe, of course!  The caller expects a
pointer in the same region as the map: that is, with the same
lifetime.  There is no way for <code>get_value_ptr()</code> to guarantee that
<code>bkt</code> is valid as long as the map <code>m</code> is valid.  The caller might, for
example, remove the key from the hashtable, thus invalidating the
bucket.  This error manifests itself as a type error, because the type
of <code>&amp;bkt.v</code> is a region pointer corresponding to the region of the
<code>alt</code>, not the region <code>r</code> which was provided as a parameter.</p>

<h4>Could this example be made to work?</h4>

<p>I don&#8217;t think it can without a lot of work.  You could imagine that
the caller would consider the map &#8220;borrowed&#8221; while the returned value
is in scope and thus try not to modify it.  But there may be aliases
to the map, so there are still no guarantees.</p>

<p>I think there are two ways to handle an example like that in a safe
fashion:</p>

<ul>
<li>Improve the GC, as described before</li>
<li>Re-write the example into a top-down style.</li>
</ul>


<p>The second &#8220;solution&#8221; is what we support today, and what is supported
by the <a href="blog/2012/02/15/regions-lite-dot-dot-dot-ish/">&#8220;Regions Lite&#8221; idea I described before</a>.  You would write:</p>

<pre><code>fn get_value_ptr&lt;K,V,R&gt;(
    m: r&amp;map&lt;K,V&gt;,
    k: &amp;K,
    f: fn(option&lt;&amp;mut V&gt;) -&gt; R) -&gt; R {
    ...
}
</code></pre>

<p>In other words, the <code>get_value_ptr()</code> method will call the closure <code>f</code>
with the result of the lookup.  In this way, the <code>get_value_ptr()</code>
method itself can guarantee that the reference is valid.</p>

<h3>Is it all worth it?</h3>

<p>It may still be worth having labeled region parameters and supporting
a limited form of returning references, but I am not sure.  I feel
like writing things in a &#8220;top-down&#8221;, CPS-ish style is perhaps just a
better solution&#8212;it&#8217;s certainly more widely applicable.</p>

<p>Regardless, I do like the idea of formalizing regions as representing
a slice of time, and defining coercions from <code>@T</code> and <code>~T</code>.  I think
that feels intuitively sensible and I think it can be explained in a
reasonable way.</p>

<p>There is however one potentially important future case where the
simple form of returning references would be enough.  If we had
user-defined memory pools, then it would be possible to have large,
dynamically allocated, multi-object data structures that all lie
within one region (the memory pool).  A map, for example, might have
an associated arena.  This scheme would then work.  When you think
about it, treating <code>@</code> as a region and having a GC which can handle
interior pointers is really the same thing as this scheme, from an
abstract point of view.</p>

<p>Another important thing is that I think there is a path from all of
these more limited forms to the more general ones.  If we say, for
now, that <code>@</code> is not a region and we do not support labeled
parameters, we can add both of those later. Existing programs will
continue to type check.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Regions-lite...ish]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/15/regions-lite-dot-dot-dot-ish/"/>
    <updated>2012-02-15T09:51:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/15/regions-lite-dot-dot-dot-ish</id>
    <content type="html"><![CDATA[<p>I was talking to brson today about the possibility of moving Rust to a
regions system.  He pointed out that the complexity costs may be high.
I was trying to make a slimmer version where explicit region names
were never required.  This is what I came up with.  The truth is, it&#8217;s
not that different from the original: adding back region names wouldn&#8217;t
change much.  But I&#8217;m posting it anyway because it includes a description
of how to handle regions in types and I think it&#8217;s the most complete and
correct proposal at the moment.</p>

<h2>The summary</h2>

<p>You would have four kinds of Rust pointers:</p>

<pre><code>@MT --- pointers to task-local, boxed data
~MT --- pointers to unique data
&amp;MT --- safe references
*MT --- unsafe, C-like pointers
</code></pre>

<p>Here the <code>M</code> refers to a mutability qualifier (default, <code>mut</code>, or
<code>const</code>) and <code>T</code> refers to a type.</p>

<p><code>&amp;MT</code> types, called references, are the new addition.  A reference is
a pointer which always points at memory whose validity is guaranteed
by some outer stack frame.  The idea is that a caller can give a
callee a reference to some memory that the callee may use but which
may not escape the callee.  This memory may be on the caller&#8217;s stack
frame or it may be a reference into the task or exchange heaps which
the caller is going to keep valid.  This guarantee is upheld by the
type system.</p>

<p>Reference types may appear anywhere.  However, if they are used within
another aggregate type such as a record, enum, or class, they &#8220;infect&#8221;
their container so that it too is considered to be a reference.  This
is done by introducing a new kind into the type system, <code>ref</code>
(actually, this is sort of a negative kind: more formally, there is a
kind <code>heap</code> which contains all types but for those that may
transitively include a reference).  This kind may or may not be user
visible: see the section on generics for a discussion of the options.</p>

<h2>Coercion between pointer types</h2>

<p>The type <code>&amp;MT</code> is not a supertype of <code>@MT</code> and <code>~MT</code>, but it is
coercable.  In the case of <code>@</code>, we could probably make it a true
subtype, but at the moment a box pointer includes a header, ref count,
etc and so is not binary compatible with a <code>&amp;</code> pointer, which would be
just a pointer to the box body.  If we changed our representation so
that <code>@</code> pointers point directly to the box and the header is stored
at a negative offset, then we could allow <code>@T</code> to be a subtype of
<code>&amp;T</code>.</p>

<p>The type <code>~MT</code>, however, can never be a subtype.  <code>~</code> is not a region.
Rather, the data at the other end of the pointer logically belongs to
a region of its own.  So we can allow <code>~MT</code> to be coerced to <code>&amp;MT</code>,
but the region will be a fresh region, and access to the <code>~MT</code> pointer
must be prevented for the scope of that fresh region.  This is called
&#8220;borrowing&#8221; a unique pointer.  It is only possible for &#8220;unique paths&#8221;,
where a &#8220;unique path&#8221; is a path of identifiers <code>a.b.c...z</code> that is the
only path by which the unique variable can be reached (in practice,
this means that <code>a</code> must be a local variable and all of the fields
<code>b...z</code> must have unique or interior type).  All of the prefixes of
the unique path must be considered borrowed as well.  I am not going
into great detail on the handling of uniques here: it should be quite
similar to what we have today in practice.</p>

<h2>Tracking validity of references</h2>

<p>Although the user never needs to write it explicitly, each instance of
a reference type is internally associated with a region.  There is one
region for every block in the code.  In addition, each function/method
has a special region called <code>caller</code>.  For simplicity I do not
consider classes nor impls; it is relatively straightforward to extend
the system to such cases.</p>

<p>Regions are arranged into a tree derived from the structure of the
blocks in the source code.  The region <code>caller</code> is a superregion of
all the internal regions to a function.</p>

<p>In the implementation / formal version of the type system, these
regions are represented explicitly.  So a user-written type <code>&amp;MT</code>
expands to a type <code>r&amp;MT</code> where <code>r</code> is the node id of the block or of
the function itself (in the case of the <code>caller</code> region).  The region <code>r</code>
is derived from the position where <code>&amp;MT</code> appears and by inference:</p>

<ul>
<li>if <code>&amp;MT</code> appears within a parameter list, <code>r</code> is the <code>caller</code> region.</li>
<li>if <code>&amp;MT</code> appears on the type of a local variable, inference is used.</li>
<li>if <code>&amp;MT</code> appears in a type declaration, see section below.</li>
</ul>


<p>In general, the type <code>a&amp;T</code> is a subtype of <code>b&amp;T</code> if <code>b</code> is a subregion
<code>a</code>.  The reason is that, because <code>a</code> is a superregion of <code>b</code>, the
pointer <code>a&amp;T</code> is always valid whenever the region <code>b</code> is valid.</p>

<h3>References in type declarations</h3>

<p>The rules for which region is assigned when the user writes <code>&amp;MT</code>
omitted one important case: what happens when this type appears in a
type declaration?  Consider the following example:</p>

<pre><code>type crate_ctxt = {
    mut_map: &amp;map&lt;...&gt;,
    node_map: &amp;map&lt;...&gt;,
    another_map: &amp;map&lt;...&gt;,
    yet_another_map: &amp;map&lt;...&gt;
};
</code></pre>

<p>In such cases, the region for the internal references will be assigned
when the type is used.  For example:</p>

<pre><code>fn trans_foo(ccx: &amp;crate_ctxt) {...}
</code></pre>

<p>Here, the type of <code>ccx</code> will be expanded to:</p>

<pre><code>caller&amp;{mut_map: caller&amp;map&lt;...&gt;, node_map: caller&amp;map&lt;...&gt;, ... }
</code></pre>

<p>In effect, types which contain references (transitively) are
implicitly parameterized by a region parameter.  There is only one
such parameter.  When the type is instantiated in a specific context,
the value for that parameter is provided based on the context.</p>

<h2>Taking the address of variables and so forth</h2>

<p>The unary operator <code>&amp;M</code> can be be used with both lvalues and rvalues.
When used with an lvalue, it takes the address of the lvalue.  The
mutability qualifier provided must agree with the mutability of the
lvalue.  When used with an rvalue, it creates temporary space on the
stack and copies the rvalue into it.</p>

<p>Here is an example of taking the address of lvalues:</p>

<pre><code>fn foo() { // region for this block is "r"
    let x = 3;
    let mut y = 4;
    let px1 = &amp;x;       // OK: yields type r&amp;int
    let px2 = &amp;const x; // OK: yields type r&amp;const int
    let px2 = &amp;mut x;   // Error: x is immutable
    let py1 = &amp;y;       // Error: y is mutable.
    let py1 = &amp;const y; // OK: yields type r&amp;const int
    let py1 = &amp;mut y;   // OK: yields type r&amp;mut int
}
</code></pre>

<p>Here is an example of taking the address of rvalues:</p>

<pre><code>fn foo() { // region for this block is "r"
    let p1 = &amp;{x: 3, y: 4}; // OK: yields type r&amp;{x:int,y:int}
    let p2 = &amp;mut {x: 3, y: 4}; // OK: yields type r&amp;mut {x:int,y:int}
}
</code></pre>

<h2>Limitations on references</h2>

<p>In order to guarantee that reference types do not escape the callee,
the type system imposes some limitations:</p>

<ul>
<li>Reference types may not be returned.</li>
<li>Reference types may not be closed over (copied/moved into a closure
or interface instance).</li>
<li>Generic type variables cannot be bound to reference types unless
the generic type variable is of the <code>ref</code> kind.</li>
</ul>


<p>I will cover each restriction in turn.  First, though, I want to more
precisely define what the type checker considers to be a reference
type.  The definition is inductive:</p>

<ul>
<li>a reference <code>&amp;MT</code>;</li>
<li>a type whose definition may contain a reference (e.g., <code>@&amp;T</code> or
<code>{x: &amp;T}</code>, or a class with a field of reference type);</li>
<li>a generic variable with bound <code>ref</code>.</li>
</ul>


<h3>Reference types may not be returned</h3>

<p>The danger here is that the callee may pass back a reference to the
caller that is no longer valid.  This is relatively straightforward to
prevent: do not allow the return type of a function to be a reference
type.</p>

<h3>Reference types may not be closed over</h3>

<p>It is not allowed to copy a reference type into a boxed/unique closure
nor is it allowed to cast a reference type to a boxed or unique iface.
The reason is that these are the points where the type system loses
the ability to track the constituent types and so we cannot
distinguish a <code>fn@()</code> that closes over a reference type from other
<code>fn@()</code> types.</p>

<h3>Generic types</h3>

<p>There is of course a concern that the limitations on reference types
might be circumvented through the use of generics.  This is prevented
through the use of a type kind <code>ref</code>.  A generic type variable may not
be bound to a reference type unless it includes the bound <code>ref</code>.
Moreover, any generic type variables bound by <code>ref</code> are considered
reference types and therefore must obey the above restrictions.</p>

<h2>A note on variance</h2>

<p>In general, ptr types like <code>&amp;MT</code> or <code>@MT</code> are covariant in T if <code>M</code> is
not <code>mut</code>.  This is different from references today which are always
covariant in T; the current behavior is what leads to the type hole
pointed out in the mailing list.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Using futures in the task API]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/14/using-futures-in-the-task-api/"/>
    <updated>2012-02-14T10:14:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/14/using-futures-in-the-task-api</id>
    <content type="html"><![CDATA[<p>Brian pointed out to me a nice solution to the Task API problem that I
have overlooked, though it&#8217;s fairly obvious.  Basically, I had
rejected a &#8220;builder&#8221; style API for tasks because there is often a need
for the child task to be able to send some data back to its parent
after it has been spawned, and a builder API cannot easily accommodate
this.  Brian&#8217;s idea was to encapsulate these using futures.  It&#8217;s
still not perfect but it&#8217;s better I think and more composable than my
first, limited proposal.  It still requires that the actor pattern be
a separate module.</p>

<p>For those of you who don&#8217;t care about the intimate details of Rust
task generation, sorry, this is a kind of a &#8220;document the idea&#8221; sort
of post.  I&#8217;m sure I&#8217;m also brushing over some of the Rust-specific
context that might be needed to make the examples easy to understand.</p>

<h3>Builder-based idea</h3>

<p>Our basic task builder data types looks like:</p>

<pre><code>type task_builder = ~{
    stack_size: uint,
    notify_chan: comm::chan&lt;result&gt;,
    ... various options ...
    gen_task_body: fn@(fn~()) -&gt; fn~
};
type task_id = uint;
enum task_result { tr_success, tr_failure };
</code></pre>

<p>Basically it&#8217;s a struct with a bunch of options.  The most interesting
part is the <code>gen_task_body()</code> field, which contains a closure
that&#8212;given the user&#8217;s task body&#8212;will return the real task body.
This allows us to accumulate transformations on the body.</p>

<p>Creating a builder implicitly sets it up with the default options:</p>

<pre><code>fn mk_task_builder() -&gt; task_builder {
     fn identity(f: fn~()) { f }

     ret ~{ ... default_options ..., gen_task_body: identity };
}
</code></pre>

<p>Then people can add <code>impl</code> methods to configure the builder.  Here is
one simple example:</p>

<pre><code>impl task_builder for task_builder {
    fn set_stack_size(ss: uint) {
        self.stack_size = ss;
    }
}
</code></pre>

<p>Here is an example of how we could make tasks that send a message to
a channel when they fail:</p>

<pre><code>impl task_builder for task_builder {
    fn notify_chan_on_failure(ch: chan&lt;task_result&gt;) {
        self.gen_task_body = fn@(body: fn~()) {
            fn~[copy ch, move body]() {
                let _ = resource send_final_msg {
                    comm::send(
                        ch,
                        if rt::is_failing {tr_failure} else {tr_success})
                }
                body();
                rt::await_all_children();
            }
        };
    }
}
</code></pre>

<p>The effect of this code is to replace the <code>gen_task_body</code> closure with
one that will wrap the user&#8217;s supplied body.  The wrapper will await
all children created by the body and send a message at the end.  The
message will indicate whether it failed or not.  The final message
send is written using an inline resource (no syntax exists for this,
but I just didn&#8217;t want to write out the full resource declaration that
is currently required).</p>

<p>This is a pretty basic mechanism.  It could be wrapped up to be a bit
more widely applicable using wrappers like so:</p>

<pre><code>impl task_builder for task_builder {
    fn make_joinable() -&gt; future&lt;task_result&gt; {
        let port = comm::port();
        self.notify_port_on_failure(port);
        future::from_port(port)
    }

    fn notify_port_on_failure(p: port&lt;task_result&gt;) {
        notify_chan_on_failure(comm::chan(ch));
    }
 }
</code></pre>

<p>Finally, the spawn method looks like this:</p>

<pre><code>fn spawn(-builder: task_builder, body: fn~()) -&gt; task_id {
    let body = builder.gen_task_body(body);
    ret rt::spawn(self, body); // do the *actual* spawn
}
</code></pre>

<p>One interesting design choice was to make the <code>task_builder</code> unique
and have it be consumed by spawn.  The idea was to prevent people from
using the same configuration to launch multiple tasks.  This is not
safe in general though it may be in some cases: people can still
explicitly <code>copy</code> the builder if desired.</p>

<h3>Wrapping spawn: The actor module</h3>

<p>Sometimes there are cases, like the actor module, where you want to
spawn the task using a different sort of body than a <code>fn~()</code>.  The
actor module, for example, expects a body <code>fn~(port&lt;A&gt;)</code> that is
provided with a port.  The idea is that you will spawn a task that
creates a port for itself and then sends a channel to that port back
to its creator.  This is effectively a wrapper around <code>task::spawn</code>:</p>

<pre><code>mod actor {
    enum actor&lt;A&gt; = { t_id: task::task_id, ch: chan&lt;A&gt; };
    fn spawn&lt;A&gt;(-builder: task_builder, body: fn~(port&lt;A&gt;)) -&gt; actor&lt;A&gt; {
        let tmp_port = comm::port();
        let tmp_chan = comm::chan(port);
        let t_id = task::spawn(builder, fn~[copy tmp_chan; move body]() {
            let port = comm::port();
            comm::send(tmp_chan, port);
            actor_body(port);
            body();
        };
        let ch = comm::recv(tmp_port);
        actor({ch: ch, t_id: t_id})
    }
}
</code></pre>

<p>It would actually be <em>possible</em> to move the actor body stuff into the
builder, but the resulting module is a little weird and error-prone to
use.  Basically it ends up being another kind of thing that wraps
<code>gen_task_body()</code>, and the danger is that if the user doesn&#8217;t invoke
the notify wrappers first, you get in a situation where the actor code
is executing before the notification wrappers get setup.  Probably not
what you wanted.  So we decided it&#8217;s better to separate out spawn
functions, which provide the real body of the task, from configuration
wrappers.  There may still be ordering dependencies between wrappers but
it&#8217;s no doubt fewer.</p>

<h3>The future module</h3>

<p>All of this kind of assumes a simple future module which I think looks like:</p>

<pre><code>mod future {
    enum future&lt;A&gt; = {
        mutable v: either&lt;A,port&lt;A&gt;&gt;
    };

    impl future&lt;A&gt; for future&lt;A&gt; {
        fn get() -&gt; A {
            alt self.v {
                either::left(v) { v }
                either::right(p) {
                    let v = comm::recv(p);
                    self.v = either::left(v);
                    ret v;
                }
            }
        }
    }

    fn from_port&lt;A&gt;(p: port&lt;A&gt;) -&gt; future&lt;A&gt; {
        future({v: left(p)})
    }

    fn from_value&lt;A&gt;(v: A) -&gt; future&lt;A&gt; {
        future({v: right(p)})
    }
}
</code></pre>

<p>To make futures more convenient to use, a function like</p>

<pre><code>mod future {
    fn spawn&lt;A&gt;(f: fn~() -&gt; A) -&gt; future&lt;A&gt; {
        let port = comm::port();
        let chan = comm::chan(port);
        let builder = task::mk_task_builder();
        task::spawn(builder, fn~[move f, chan]() {
            let v = f();
            comm::send(chan, v);
        });
        from_port(port)
    }
}
</code></pre>

<p>would let you write code like:</p>

<pre><code>let f = future::spawn {|| some_expensive_computation() };
...
let r = f.get();
</code></pre>

<p>Horray.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Task API]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/13/task-api/"/>
    <updated>2012-02-13T16:39:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/13/task-api</id>
    <content type="html"><![CDATA[<p>One of the thorny API problems I&#8217;ve been thinking about lately is the
task API for Rust.  I originally had in mind this fancy and very
flexible aproach based on bind.  When I spelled it out I found it was
very powerful and flexible but also completely unworkable in practice.</p>

<p>So here is a more limited proposal.  There is a core task API that
looks something like this:</p>

<pre><code>enum task = uint; // wrap the task ID or whatever
type opts = { ... };

fn default_opts() -&gt; opts;

fn spawn(opts: opts, body: fn~()) -&gt; task;
</code></pre>

<p>The options struct will let you control simple things like stack size
and so forth.</p>

<p>On top of this there will be several patterns.  These probably live in
separate modules.  For example, an <code>actor</code> pattern which starts up a
task with its own mailbox:</p>

<pre><code>enum actor&lt;A&gt; = {
    task: task::task,
    chan: comm::chan&lt;A&gt;
}

impl actor&lt;A&gt; for actor&lt;A&gt; {
    fn send(msg: A) { comm::send(self.chan, msg) }
}

fn spawn&lt;A,R&gt;(
    opts: options,
    body: fn~(p: comm::port&lt;A&gt;) -&gt; R) -&gt; task&lt;A,R&gt; {

    let pp_tmp = comm::port();
    let ch_tmp = comm::chan(p);
    let t = task::spawn(opts) {||
        let p = comm::port();
        comm::send(ch_tmp, comm::chan(p));
        body(p);
    };
    let ch = comm::recv(pp_tmp);
    actor(t, ch)
}
</code></pre>

<p>Or a futures-like pattern that spawns off a task and allows you to
invoke a <code>get()</code> method to read its result:</p>

<pre><code>enum future&lt;R&gt; = {
    task: task::task,
    mutable rslt: either&lt;R, comm::port&lt;R&gt;&gt;
}

impl future&lt;A&gt; for future&lt;A&gt; {
    fn get() -&gt; R {
        alt self.rslt {
            either::left(r) { r }
            either::right(p) {
                let r = comm::recv(self.port);
                self.rslt = either::left(r);
                ret r;
            }
        }
    }
}

fn spawn&lt;R&gt;(
    opts: options,
    body: fn~() -&gt; R) -&gt; future&lt;R&gt; {

    let pp = comm::port();
    let ch = comm::chan(p);
    let t = task::spawn(opts) {||
        comm::send(ch, body())
    };
    future(t, either::left(pp))
}
</code></pre>

<p>The downside of this approach is that it is not particularly
composable (you can&#8217;t, for example, combine a future and actor
together in one task without writing your own code). However, it&#8217;s
easy enough to understand and it&#8217;s probably good enough.</p>

<p>Hmm, that last sentence makes me wonder if classes and traits could help
here.  Oh well, a thought for another day.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Auto-serialization in Rust]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/09/auto-serialization-in-rust/"/>
    <updated>2012-02-09T18:34:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/09/auto-serialization-in-rust</id>
    <content type="html"><![CDATA[<p>I&#8217;ve been working on implementing <a href="https://github.com/mozilla/rust/issues/1765">Cross-Crate Inlining</a>.  The
major task here is to serialize the AST.  This is conceptually trivial
but in practice a major pain.  It&#8217;s an interesting fact that the more
tightly you type your data, the more of a pain it (generally) is to
work with in a generic fashion.  Of functional-ish languages that I&#8217;ve
used, Scala actually makes things relatively easy by using a
combination of reflection and dynamic typing (interfaces like
<a href="http://www.scala-lang.org/api/current/index.html#scala.Product"><code>Product</code></a> come to mind).</p>

<p>Anyway, Rust does not (yet?) have reflection, but I have been working
on a program which will autogenerate the serialization code for our
AST based on the type definitions itself.  Normally, I would probably
do this with some Python program and a bunch of hacky regular
expressions.  But instead I am taking advantage of one of Rust&#8217;s nicer
(and somewhat unusual, although becoming less so) features: the fact
that the Rust compiler is itself a library. <em>(An aside: I plan to
implement this serialization code as a syntax extension or macro once
those systems mature.)</em></p>

<p>To use <code>serializer</code>, you provide it with a crate file and a set of
type names.  It will then generate Rust code that serializes instances
of those types. Internally, it invokes the compiler to parse and type
check the crate, using the <code>compile_upto()</code> function, which allows you
to compile a given input up until a certain point (in this case, up
until the type checking phase has completed).</p>

<p><em>An aside:</em> This is the point where the beauty of crate files becomes more
apparent: a crate is a self-contained specification that not only
contains a listing of the source modules and so forth, but also the
external crates that are required, default compilation options, etc.
Having all of this mess encapsulated in a crate means that it is
trivial for a tool like <code>serializer</code> to recreate the compilation
environment for your package: just provide it with a crate file.  If
this were a C program, you&#8217;d also have to supply a random smattering
of gcc options, which you would in turn have to figure out how to
extract from your makefile, not to mention the makefiles from external
packages that you are using.  Ugh.</p>

<p>Once <code>serializer</code> has parsed and type-checked your source, it is
provided with a crate AST and a type context (<code>ty::ctxt</code>).  Using
these two things, it&#8217;s fairly straightforward to locate the
definitions for the types we are supposed to serialize and walk over
them, generating code as we go.</p>

<p>The actual code works by walking <code>ty::t</code> instances.  <code>ty::t</code> is the
type used in the Rust compiler to represent types.  This is distinct
from <code>ast::ty</code>, which is the syntax tree that represents a type.
<code>ty::t</code> is modeled after the type system in the abstract, which makes
it easier to work with.  The other reason to walk <code>ty::t</code> instances
and not <code>ast::ty</code> is that there is no AST available for types defined
in external crates (such as <code>option::t</code>, defined in <code>libcore</code>).</p>

<p>Basically, for each unique <code>ty::t</code> that we encounter we generate a function
of the form:</p>

<pre><code>fn serialize&lt;C: serialization::ctxt&gt;(cx: C, t: T) {
    ...
}
</code></pre>

<p>Here <code>T</code> is the type represented by the <code>ty::t</code>.  The variable <code>cx</code> is
a serialization context.  This is defined using an interface
<code>serialization::ctxt</code>, which looks like so:</p>

<pre><code>mod serialization {
    iface ctxt {
        fn emit_u64(x: u64);
        fn emit_i64(x: i64);

        fn emit_record(f: fn());
        fn emit_field(f_name: str, f_id: uint, f: fn());

        fn emit_enum(e_name: str, f: fn());
        fn emit_variant(v_name: str, v_id: uint, f: fn());

        ...
    }
}
</code></pre>

<p>So, for example, the serialization function for a type <code>{x: uint, y: uint}</code>
would look something like:</p>

<pre><code>fn serialize1&lt;C: serialization::ctxt&gt;(cx: C, &amp;&amp;v: {x: uint, y: uint}) {
    cx.emit_record {||
        cx.emit_field("x", 0) {||
            cx.emit_u64(v.x as u64);
        }
        cx.emit_field("y", 1) {||
            cx.emit_u64(v.y as u64);
        }
    }
}
</code></pre>

<p>Now, to deserialize, we generate similar code for a deserialization interface:</p>

<pre><code>fn deserialize1&lt;C: deserialization::ctxt&gt;(cx: C) -&gt; {x: uint, y: uint} {
    cx.read_record {||
        let x = cx.read_field("x", 0) {||
            cx.read_u64() as uint
        }
        let y = cx.read_field("y", 1) {||
            cx.read_u64() as uint
        }
        {x: x, y: y}
    }
}
</code></pre>

<p>The deserialization interface looks like:</p>

<pre><code>mod deserialization {
    iface ctxt {
        fn read_u64() -&gt; u64;
        fn read_i64() -&gt; i64;

        fn read_record&lt;T&gt;(f: fn() -&gt; T) -&gt; T;
        fn read_field&lt;T&gt;(f_name: str, f_id: uint, f: fn() -&gt; T) -&gt; T;

        fn read_enum&lt;T&gt;(f: fn(uint) -&gt; T);

        ...
    }
}
</code></pre>

<p>A somewhat more interesting case concerns enums.  Let&#8217;s consider the
enum <code>option&lt;R&gt;</code> where <code>R</code> is the record type we&#8217;ve been working with.
It would be serialized as:</p>

<pre><code>type R = {x: uint, y: uint};
fn serialize2&lt;C: serialization::ctxt&gt;(cx: C, &amp;&amp;v: option&lt;R&gt;) {
    cx.emit_enum("std::option::t&lt;R&gt;") {||
        alt v {
            none {
                cx.emit_variant("std::option::none", 0u) {||
                }
            }
            some(r) {
                cx.emit_variant("std::option::some, 1u) {||
                    serialize1(cx, r); // link to the previous code we saw
                }
            }
        }
    }
}
</code></pre>

<p>The deserializer meanwhile would look like:</p>

<pre><code>fn deserialize2&lt;C: deserialization::ctxt&gt;(cx: C) -&gt; option&lt;uint&gt; {
    cx.read_enum {|v_id|
        alt v_id {
            0u { // std::option::none
                std::option::none
            }

            1u { // std::option::some
                std::option::some(deserialize1(cx))
            }

            _ {
                fail #fmt["Unexpected discriminant %u for option::option",
                    v_id];
            }
        }
    }
}
</code></pre>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Breaking out is hard to do]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/02/breaking-out-is-hard-to-do/"/>
    <updated>2012-02-02T11:23:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/02/breaking-out-is-hard-to-do</id>
    <content type="html"><![CDATA[<p>One of the things I&#8217;d like to do for the iteration library is settle
on a convention for breaking and continuing within loops.  There is a
bug on this issue (<a href="https://github.com/mozilla/rust/issues/1619">#1619</a>) and it seems like the general
approach is clear but some of the particulars are less so.  So I
thought I&#8217;d try to enumerate how code will look under the various
alternatives and then maybe we can settle on one: they&#8217;re all fairly
similar.  Who knows, maybe just writing things out will settle my
mind.</p>

<h3>Alternative #1: The <code>loop_ctl</code> type.</h3>

<p>This was my original proposal.  Basically, there will be a type
called <code>iter::loop_ctl</code> defined as so:</p>

<pre><code>enum loop_ctl { lc_break, lc_cont }
</code></pre>

<p>I wanted to design something that felt as much like normal loops as
possible.  So my thought was that, for sugared closures where the
return type was <code>loop_ctl</code>, the compiler could insert <code>lc_cont</code> as the
tail expression should there not be one already.</p>

<p>The idea then was to change <code>vec::iter()</code> from a function with the signature:</p>

<pre><code>fn iter&lt;T&gt;(v: [T], f: fn(T))
</code></pre>

<p>to the following:</p>

<pre><code>fn iter&lt;T&gt;(v: [T], f: fn(T) -&gt; loop_ctl)
</code></pre>

<p>In other words, the function supplied to <code>iter</code> would be allowed to
break the loop in the middle if it wanted to.  Due to the default
rules, this is mostly invisible, except that you can say <code>break</code> and
<code>cont</code> and things work as you expect.</p>

<p>Unfortunately, as I think more about it, I realize that the default
rules aren&#8217;t quite subtle enough.  It&#8217;s only <em>mostly</em> invisible.  For example:</p>

<pre><code>vec::iter(v) {
    while cond { }
}
</code></pre>

<p>This would fail because while loops have a result type of <code>()</code>, and in
this case the <code>while</code> loop occupies the tail expression slot.  So you would
have to write:</p>

<pre><code>vec::iter(v) {
    while cond { }
    cont;
}
</code></pre>

<p>This makes me unhappy.  I&#8217;m happy with smart rules but only if they
really work all the time or have a consistent story.  You could extend
the rule to say &#8220;if the tail expression has unit type, still insert a
default <code>cont</code>&#8221;, but now it&#8217;s starting to sound really magical.</p>

<h3>Alternative #2:</h3>

<p>We can keep the <code>loop_ctl</code> type but just not make it special.  Iterable
types define two methods, <code>iter</code> and <code>iter_brk</code> (not sure about those names),
with signatures as shown (these are for vectors:</p>

<pre><code>iface iterable&lt;T&gt; {
    fn iter(f: fn(T)) /* as today */
    fn iter_brk(f: fn(T) -&gt; loop_ctl)
}
</code></pre>

<p>Now when you want to break, you have to end the loop explicitly with <code>cont</code>.</p>

<p>For most types, you need only define <code>iter_brk</code>: the <code>iter</code> function
itself can be defined generically as shown (this assumes traits are
implemented):</p>

<pre><code>trait base_iter&lt;T&gt; {
    req fn iter_brk(f: fn(T) -&gt; loop_ctl);

    fn iter(f: fn(A)) {
        self.iter_brk {|e|
            f(e);
            cont; 
        }
    }
}
</code></pre>

<h3>Alternative #3:</h3>

<p>Same as #2, but we replace the <code>loop_ctl</code> type with boolean.  This is
appealing because it&#8217;s so minimalistic.  <code>break</code> would effectively
return <code>false</code> and <code>cont</code> would return <code>true</code>.  This makes <code>iter_brk</code>
effectively the same as the predicate test <code>all()</code>, which returns true
if the block returns true for all members.</p>

<p>Of course, if we actually <em>used</em> <code>all()</code> instead of <code>iter_brk</code> that&#8217;d be
ok too, except that the return type of all is <code>bool</code>, so a semicolon would
be required:</p>

<pre><code>v.all {|i|
    ...
};
</code></pre>

<p>We could of course have both <code>all</code> and <code>iter_brk</code> (as we would in
alternative #2).</p>

<h3>My preference?</h3>

<p>I started out liking alternative #1, but writing this blog post has
more-or-less persuaded me that I prefer alternative #2.  Less compiler
magic is good, and compiler magic that fails is bad.  Between
alternatives #2 and #3, I tend to slightly prefer an explicit
<code>loop_ctl</code> type over a boolean for a couple of reasons:</p>

<ul>
<li>the types more closely reflect the intention.  To me, testing
whether a predicate holds on all members is not the same as
interrupting a loop early.</li>
<li>you can&#8217;t use <code>break</code> and <code>cont</code> to return out of arbitrary blocks
that happen to return boolean.</li>
<li><p>you can always write helpers like</p>

<pre><code>fn break_if(b: bool) -&gt; loop_ctl { if b { lc_break } else { lc_cont } }
fn cont_if(b: bool) -&gt; loop_ctl { break_if(!b) }
</code></pre>

<p>to convert between <code>bool</code> and <code>loop_ctl</code> when convenient.</p></li>
</ul>


<p>But obviously there is no substantive difference between alternatives</p>

<h1>2 and #3.</h1>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Cross-crate inlining]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/02/cross-crate-inlining/"/>
    <updated>2012-02-02T08:41:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/02/cross-crate-inlining</id>
    <content type="html"><![CDATA[<p>Cross-crate inlining (CCI) refers to the ability to inline a function
across crate boundaries.  In Rust, a &#8220;crate&#8221; is the unit of
compilation, rather than an individual file as in C or C++.  A crate
basically corresponds to a single library or executable, but it may
contain any number of modules and source files internally.  CCI is
important for performance due to the ubiquitous use of small methods
like <code>vec::iter()</code> in our source code.  Such methods have proven to be
a very scalable way to define iteration abstracts, but performance is
currently somewhat lacking.</p>

<p>The major language-level issue associated with CCI is that it
interferes with separate compilation.  I won&#8217;t talk about this at the
moment; we may choose to only inline when statically linking, or to
give users some way to distinguish what can be inlined and what must
not be, etc.</p>

<p>What I do want to look at is the best way to <em>implement</em> CCI in our
compiler.  Right now the compiler is focused on compiling one crate at
a time and so a few things will have to change.</p>

<p>pcwalton forwarded me a partial patch which tries to separate out
various parts of the compiler to generalize to multiple crates.  I am
not sure, though, that this is worth the effort: after all, we are
still compiling one <em>main</em> crate, we&#8217;re just borrowing code from other
crates.  Furthermore, we never intend to report errors on the imported
crates: they have typechecked etc, so nothing should go wrong.  The
only reason that we will need to know about them at all is for line
number reporting within the compiler.  Therefore, I am leaning now
towards keeping things mostly the same, but adding files from inlined
crates into the existing <code>codemap</code> structure where necessary.</p>

<p>Another question is how to make the AST available within a compiled
crate; the inliner will need to reference it to produce an inlined
version, after all.  There are a couple of dimensions here.  The first
is how to serialize the AST at all&#8212;the easiest way is to use the
Rust pretty printer.  The best way, I think, is to write the tree out
in EBML.  The reason for this is that it allows to retain the spans
from the original source, which will be lost by the pretty printer.
It also allows us to keep the various information we keep in
side-tables, such as the type associated with each node.  We may
nonetheless start with pretty printed source and change later, if that
proves expedient.</p>

<p>The second dimension to the question of serializing source is at what
granularity to do it.  Graydon has pointed out that we should be
sensitive to the compile-time impact of inlining and monomorphization,
and I agree.  However, we are primarily interested in the compile-time
impact on <em>debug-mode</em> compilations, meaning that inlining would
probably be disabled.  Nonetheless, we should be careful about how we
package up the source for three reasons: (1) monomorphization, when it
occurs, will require access even in debug mode; (2) if we play our
cards right, we may be able to consolidate some of the control paths
we use in type checking and elsewhere (more on that in a bit); and (3)
having faster compile times even with optimizations enabled never
hurts.</p>

<p>This suggests to me that we do not want to include the source for an
entire crate at a time.  It seems like items are the logical level for
this.  I am wondering if we could encode the module structure using
EBML but at each top-level item we stop and encode two things: the
signature of the item as well as the source.  Currently, we encode the
signature using EBML, and perhaps we can just keep that path, though
it may be easier to pretty-print the signature and then parse it
again.</p>

<p>Why would that be easier, you ask? After all, the current system
works, right? Well, the idea (hat tip to pcwalton here) is to
consolidate some of the logic in the compiler.  Currently, everytime
we resolve the type of an item, we must ask &#8220;is it in the current
crate or not?&#8221; If it is, we go through one path, which involves
looking up the AST and other internal tables.  If it is not, then we
look into a crate metadata cache and&#8212;if needed&#8212;reconstruct the
signature by parsing EBML.  So my thought is that perhaps we can bring
these paths together by filling in the AST and other information lazilly,
extracting what is needed out of the crate.</p>

<p>Actually, the question of pretty-printing vs EBML is somewhat
orthogonal, I suppose.  In fact, it might be better to keep the EBML
for the reasons discussed earlier (easier to reconstruct the various
side table information that is required).</p>

<p>So, I am starting to see a high-level vision for how this might all be
organized, but I don&#8217;t know these paths of the code that well, so I
might turn out to be rather confused.  It&#8217;s also a bit unclear to me
if consolidating the paths through the compiler is important to CCI or
just a nice thing to do.  I&#8217;d rather get results first and work on
refactoring second.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Update]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/01/update/"/>
    <updated>2012-02-01T20:50:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/01/update</id>
    <content type="html"><![CDATA[<p>It&#8217;s been a while since I wrote anything on the blog! A lot has been
going on in the meantime, both in Rust, parallel JavaScript, and
personally&#8230;I hate to write a big update post but I gotta&#8217; catch up
somehow!</p>

<h3>Rust</h3>

<p>First, we made our 0.1 release, which is great.  We are now planning
for 0.2.  The goal is to make frequent, relatively regular releases.
We&#8217;re still in such an early phase that it doesn&#8217;t seem to make sense
to literally release every few months, but at the same time we don&#8217;t
plan to wait long.</p>

<p>Lately I&#8217;ve been working on several issues, some major, some minor.
The most exciting to me has been <a href="https://github.com/mozilla/rust/issues/1493">#1493</a>, which vastly improved
our performance when working with thread-local boxes.  There is still
<a href="https://github.com/mozilla/rust/issues/1737">work to be done on box performance</a>, but before taking any next
steps more investigation is needed.  The problems may be creating type
descriptors, they may be garbage collection, etc.  This is also a
first step to
<a href="https://github.com/mozilla/rust/issues/1739">simplifying some of the scarier parts of the Rust runtime</a>.</p>

<p>On a smaller note, I&#8217;ve been working on a new iteration library that
I&#8217;m reasonably excited about, based on the principles from
<a href="http://smallcultfollowing.com/babysteps/blog/2011/12/29/composing-blocks/">an earlier blog post</a>.  Making it nice to use will require a
<a href="https://github.com/mozilla/rust/issues/1649">lighter bind syntax</a>, which I&#8217;ve almost got complete now.
This also (as a side effect) improves our type inference somewhat.
One final thing I&#8217;d like to do is integrate support for
<a href="https://github.com/mozilla/rust/issues/1619">break and cont</a> into the library, though some details remain to
be discussed there.</p>

<p>Finally, I&#8217;m beginning work now on cross-crate inlining.  I plan to
write a post later exploring some of the dynamics of this.  It&#8217;s an
interesting problem.  It&#8217;s clearly important for performance, though:
functions like <code>vec::iter()</code> are used ubiquitously and we need to be
able to inline their definitions.  This is also where Rust&#8217;s emphasis
on static dispatch can really pay off (not that it is not helpful
already at reducing dispatch costs).</p>

<h3>Parallel JavaScript</h3>

<p>I have been progressing on the parallelism work that occupied the
<a href="http://smallcultfollowing.com/babysteps/blog/2012/01/09/parallel-javascript/">last few posts</a>.  First, the parallel JavaScript work now
has an official name: the <a href="https://github.com/nikomatsakis/pjs">PJs project</a>.  That github project is
not that useful at the moment, though: I am in the process of moving
the code from an <a href="https://github.com/nikomatsakis/pjs-old">older GitHub project</a> into the new one
which contains the rest of mozilla-central.</p>

<p>What exists today is a (very) simple multi-threaded infrastructure
that will run JavaScript functions in parallel, multiplexing them over
a fixed number of workers.  Right now though there is no data sharing
between threads.  I am working on the membrane approach.</p>

<p>I have also had some discussions with the <a href="https://github.com/RiverTrail/RiverTrail">Rivertrail</a>
group.  The Rivertrail definition of an elemental function dovetails
very well with my own ideas, so it seems like the two projects could
be fruitfully combined: the PJs API can be used both for task-based
parallel tasks and for those instances where vectorization either
fails or is not effective.</p>

<p>Finally, I drew up a <a href="https://github.com/nikomatsakis/patpar">Java-based prototype</a> of the API as well
as the <a href="http://smallcultfollowing.com/babysteps/blog/2011/12/09/pure-blocks/">static type system that I described earlier</a>.  Having
this prototype gives me confidence that the approach will work, as I
found it quite easy to parallelize a number of interesting examples.</p>

<h3>Whew!</h3>

<p>Man, I didn&#8217;t realize how much stuff has been going on.  No wonder I
haven&#8217;t had time for blog posts!  And this isn&#8217;t even a complete list,
really.  But it&#8217;s everything that&#8217;s interesting, I suppose.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Proposed JS parallelism vs actors]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/01/11/proposed-js-parallelism-vs-actors/"/>
    <updated>2012-01-11T10:38:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/01/11/proposed-js-parallelism-vs-actors</id>
    <content type="html"><![CDATA[<p>In one of the comments on yesterday&#8217;s post,
<a href="http://smallcultfollowing.com/babysteps/blog/2012/01/09/parallel-javascript/#comment-407714243">Tushar Pokle asked</a> why I would champion my model over an
Erlang model of strict data separation.  There are several answers to
this question.  The simplest answer is that Web Workers already
provide an actors model, though they do not make tasks particularly
cheap (it&#8217;s possible to work around this by creating a fixed number of
workers and sending tasks for them to execute).</p>

<p>The better answer is that I don&#8217;t think that Erlang&#8217;s actor model and
the model I propose are that far apart.  I see this model as a kind of
&#8220;delimited actor&#8221;.  Why would I say that, since it does not seem to
resemble actors at a superficial level?  The reason is that, in my
model, each child is quite isolated from one another.  In a typical
&#8220;shared memory&#8221; model, processes communicate by modifying common data
structures.  This turns out to be highly unreliable.</p>

<p>In the model I propose (which needs a name), processes may share data
structures, but they cannot communicate this way, as those structures
are immutable.  In fact, the only way that sibling processes can
communicate with one another is by joining each other.  This allows
them to recieve the other processes result.  This is effectively a
one-shot message from one task to another.</p>

<p>So, in a way my model is a simplification of actors: it allows you to
spawn a set of actors.  The parent data which they share is
effectively an initial message from the parent to each child.  The
child&#8217;s result is then a one time message from each child to the
parent (or to other siblings in a <a href="http://en.wikipedia.org/wiki/Directed_acyclic_graph">DAG</a>-like fashion).</p>

<p>I don&#8217;t actually think my model is a good choice as the <em>only</em> model
for parallelism in your language, but I think it <em>complements</em> actors
quite well.  Consider what you would do in Erlang if you want to
process the members of a list in parallel: you would create a task for
each member of the list and send it whatever context is requires.  You
would then receive back the new values and construct the new list.  In
other words, you would implement precisely the messaging pattern that
this model defines.</p>

<p>Of course, as often happens, supporting only a limited model for
messaging lets you optimize things in the implementation. Because we
know that the child processes are only of limited duration, we don&#8217;t
have to copy the parent&#8217;s data but can instead allow them to reference
it (readonly) directly.  Similarly, because we know that each child is
dead when its result is received, we don&#8217;t have to copy the result
into the parent&#8217;s address space, but can again reuse the values
directly.  Finally, the garbage collector does not have to consider
the case of cross-process garbage collection: the parent&#8217;s data is
immutable, so whatever is live will remain live.  The data accessible
to each child is disjoint, so we can collect data owned by each child
independently without looking at the others.</p>

<p>I am tempted to call my model &#8220;delimited actors&#8221;, but I think it looks
sufficiently different from actors that this name might be misleading.
But, as I just argued, I think it is closer to actors than to the
&#8220;shared memory&#8221; model that has caused so many problems and
difficulties.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Parallel Javascript]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/01/09/parallel-javascript/"/>
    <updated>2012-01-09T16:55:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/01/09/parallel-javascript</id>
    <content type="html"><![CDATA[<p>Lately the ideas for a parallel, shared memory JavaScript have begun
to take shape.  I&#8217;ve been discussing with <a href="http://calculist.org/">various</a>
<a href="http://blog.mozilla.com/luke/">Java</a><a href="http://mozakai.blogspot.com/">Script</a> <a href="http://donovanpreston.blogspot.com/">luminaries</a> and it seems like a
design is starting to emerge.  This post serves as a documentation of
the basic ideas; I&#8217;m sure the details will change as we go along.</p>

<h3>User Model</h3>

<p>The model is that a JavaScript worker (the &#8220;parent&#8221;) may spawn a
number of child tasks (the &#8220;children&#8221;).  The parent is suspended while
the children execute, meaning that it will not process events or take
other actions.  Once the children have completed the parent will be
re-awoken.</p>

<p>Each object in JavaScript is owned by the task which created it.
Children may access all of the objects of their parent, but only in a
read-only fashion.  This is enforced dynamically.  When a task
completes, its objects become owned by the parent.  Therefore, the
child tasks may create data and operate on it in a mutable fashion;
when they finish they can return some of this data to the parent as
their result.  When the parent resumes, they are now the owner of the
data (as the child has finished) and so they may freely manipulate it.</p>

<p>One nice feature of this model is that the data which a given piece of
code has access to is precisely the same set as the data which it is
allowed to read.  So reads can proceed with no overhead at all (well,
in theory; see the implementation section below).  Writes require an
extra check to guarantee that the object is owned by the task doing
the writing (again, in theory; see the implementation section below).</p>

<p>Parallel children will only be usable from web workers.  The reason is
that the model is inherently blocking (the parent must not execute
while the children are executing or we would have dataraces), and we
do not want to permit the main UI thread to be blocked.</p>

<p>One last piece of the puzzle concerns arrays and array buffers.  I
want the option to divide up large arrays and array buffers amongst
workers in such a way that each gets a disjoint view into the buffer.
Each worker could then read/write into their disjoint view in
parallel.  Access to the original array or array buffer would yield an
exception until all children have completed.  The user would select
how the view buffer should be divided up (tiled, striped,
checkerboard, etc).  This will probably be deferred until later though.</p>

<h3>API</h3>

<p>I would like to expose the API in two levels.  The first would be the
more primitive, building blocks API which permits child tasks to be
forked and joined.  The second would be a higher level API that
operates over entire arrays or array buffers.  The higher level API
would be more than just convenience: it would be needed for creating
the disjoint views discussed at the end of the previous section.</p>

<h4>Creating and querying tasks</h4>

<p>To execute in parallel, you begin by creating a scheduler:</p>

<pre><code>let sched = scheduler();
</code></pre>

<p>Each scheduler is bound to a parent task, which is always
the task which created it.  You can use the scheduler to create
child tasks which will execute while the scheduler is active.
Child tasks can be created in two ways:</p>

<pre><code>let task1 = sched.fork(function() {...});
let task2 = sched.forkN(n, function(idx) {...});
</code></pre>

<p>The first, <code>fork()</code>, simply creates a task that will execute the given
function.  The result <code>task1</code> is a task object, which supports the
method <code>get()</code> (more on that later).  The <code>forkN()</code> variant creates a
task which will process <code>n</code> items: he function provided as argument
will be invoked once per item, with <code>idx</code> ranging from <code>0</code> to <code>n-1</code>.
The individual invocations of this function may themselves occur in
parallel.</p>

<p>Executing the tasks can be done using the execute method of the scheduler:</p>

<pre><code>sched.execute()
</code></pre>

<p>This will cause a block parallel phase in which all forked tasks associated
with the scheduler execute.</p>

<p>Finally, each task produces a result.  For a <code>fork()</code> task, the result
is simply the return value of the function.  For a <code>forkN()</code> task, the
result is an array containing all the results (so
<code>[func(0), ..., func(n-1)]</code>).</p>

<p>Whichever way it is created, the result of a task can be accessed using
the <code>get()</code> method.  If executed within the parent, the <code>get()</code> method
must be called after the <code>sched.execute()</code> call or else an error occurs.
If executed within a child, the <code>get()</code> method will effectively join the
other task and read its result.  The result will then be read-only within
the child.</p>

<h4>Dividing arrays and buffers</h4>

<p>Building on this low-level API, there is a higher-level API for processing
arrays and buffers.  I am not precisely sure how this should look, but I
think it will be something like:</p>

<pre><code>array.update_in_parallel(strategy, function(ctx, view) {
    ... array is inaccessible in here; each child task gets
        a disjoint view which is a read/write slice of the array ...
});
</code></pre>

<p>Here the parameter <code>strategy</code> would specify how the array should be
divided into views.  I think the best thing is probably to look at the
X10 and Chapel languages (as well as their predecessors) and see how
they handle the dividing of arrays across distributed processors.  We
probably want something similar but (hopefully) simpler.  In an ideal
world, strategy would be some sort of functions so that users could
specify how the array is divided, though this opens the door to races
if the function is invalid.  Another simpler alternative is to allow
various strings like &#8220;tiled&#8221;, &#8220;striped&#8221;, etc.</p>

<h3>Implementation</h3>

<p>A plan for a working prototype based on Spidermonkey has begun to
emerge.  First, the idea would be to have a pool of worker threads
that will execute the parallel tasks by drawing on a shared queue (or
possibly using work stealing, this is not so important).</p>

<p>The tricky part is how to manage the shared data from the parent task.
We need to make that data available to the children in such a way that
it can (a) be safely read in parallel and (b) not be modified.  You
may think that (a) should come for free, but in fact it does not.
This is because Spidermonkey optimizes reads, which in JavaScript can
be quite complex, by making use of caching and other techinques which
do in fact modify the representation of the object being read.</p>

<p>The technique that we plan to use is similar to what is used to
protect domains from one another.  Each task will have its own
<a href="http://andreasgal.com/2010/10/13/compartments/">compartment</a>.  This effect creates a partitioned heap
where each task has a separate heap.  Then, for each upvar of the
parent that must be used from the child, a proxy object (not a JS
proxy, but a Spidermonkey proxy, which is a lower level tool) will be
created.  Reads to this parent object will therefore always go through
a proxy: this proxy is responsible for taking a naive read path that
does not modify the parent object.  This means that reads of parent
objects during parallel code will be somewhat slower.  However, as a
benefit, this proxy can trivially prevent writes to parent objects, so
it is easy to ensure that the siblings do not step on each other.</p>

<p>Once a task ends, the data in its compartment can be given to the
parent.  Basically a compartment contains a number of arena pages.
Those arena pages can simply be moved from the child task&#8217;s
compartment into the parent task&#8217;s compartment.  The only thing left
to do is to replace any references to the proxied parent objects with
the actual objects themselves.  Note that because the parent objects
are immutable, it will not be possible to have references from the
parent objects to the child objects, only the other way around.  We
could also run the garbage collector if we wanted, which would cause
all child data not reachable from the child&#8217;s result to be collected.</p>

<p>One big advantage of this proxy-and-compartment-based design is that
it should not require modifying the JIT or the interpreter.  Both of
those systems are already aware of proxies and would treat them specially.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Composing blocks]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/29/composing-blocks/"/>
    <updated>2011-12-29T21:19:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/29/composing-blocks</id>
    <content type="html"><![CDATA[<p>The original Rust design included iterators very similar to Python&#8217;s
generators.  As I understand it, these were stripped out in favor of
Ruby-esque blocks, partially because nobody could agree on the best
way to implement iterators.  I like blocks, but it seems like it&#8217;s
more natural to compose iterators, so I wanted to think a bit about
how one might use blocks to achieve similar things.  I&#8217;m sure this is
nothing new; there must be hundreds of libraries in Haskell that do
the same things I&#8217;m talking about here.</p>

<p>A very simple example of what I mean by iterator composition is
Python&#8217;s <code>enumerate()</code> function, which converts an iterator over <code>T</code>
items into an iterator over <code>(uint, T)</code> pairs, where the <code>uint</code>
represents the index.  This handy little function allows any loop to
easily track the index, no matter what it is iterating over.  So I can
write:</p>

<pre><code>for (idx, elem) in enumerate(list): ...
for (idx, elem) in enumerate(dict.keys()): ...
for (idx, elem) in enumerate(dict.values()): ...
for (idx, elem) in enumerate(anything at all): ...
</code></pre>

<p>This is very useful.</p>

<p>Now, in Rust, we have a function <code>vec::iter()</code>, defined like so:</p>

<pre><code>fn iter&lt;T&gt;(v: [T], blk: block(T)) {
    uint::range(0, vec::len(v)) { |i|
        blk(v[i]);
    }
}
</code></pre>

<p>Suppose that we wanted to write some kind of generic <code>enumerate()</code>
style function that would convert a function like <code>iter</code> into one
that provides indices.  I think the only way to do this in Rust is
to write something like:</p>

<pre><code>fn enumerate&lt;S,T&gt;(iter_fn: block(block(T)), blk: block(uint, T)) {
    let i = 0u;
    iter_fn() { |t|
        blk(i, t);
        i += 1u;
    }
}
</code></pre>

<p>This would then be used like so:</p>

<pre><code>enumerate(bind vec::iter(v, _)) { |i, e| ... }
enumerate(bind m.keys(_)) { |i, e| ... }
enumerate(bind m.values(_)) { |i, e| ... }
</code></pre>

<p>Overall, this is not too bad.  A lighterweight curry syntax would make
it somewhat more pleasant, but I rather like <code>bind</code> as it is, so I
don&#8217;t have any concrete suggestions.  Besides, after my foray into
expanding the possibilities of block sugar in expressions, I am done
with thinking about syntax for a little while!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Block sugar in expressions]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/29/block-sugar-in-expressions/"/>
    <updated>2011-12-29T07:30:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/29/block-sugar-in-expressions</id>
    <content type="html"><![CDATA[<p><strong>UPDATE:</strong> I found some more complications.  Updates inline.</p>

<p>I have been working on and off on allowing block sugar to appear in
Rust expressions and not only statements.  For those who do not know
what I am talking about, let me give a bit of context.  At the moment,
one can write the following in Rust:</p>

<pre><code>vec::iter(v) { |e| 
   ...
}
</code></pre>

<p>which is sugar for the function call:</p>

<pre><code>vec::iter(v, { |e| 
   ...
})
</code></pre>

<p>Objectively, there isn&#8217;t much difference between the two, but somehow
pulling the <code>{||}</code> out of the parentheses feels much lighter to me.</p>

<p>However, today, this sugar is only allowed in statements.  That is,
the result of the call to <code>vec::iter()</code> is ignored.  For
<code>vec::iter()</code>, this makes sense, since the result is unit.  But it
might also be nice to be able to write:</p>

<pre><code>let foo = vec::map(v) { |e|
    ...
};
</code></pre>

<p>or:</p>

<pre><code>let child = task::spawn { ||
    ...
};
</code></pre>

<p>In implementing this, however, I&#8217;ve run into a few dark corners of the
syntax.  I&#8217;m trying to find the best way to support such sugar with
minimal changes to the language.</p>

<h4>The problem</h4>

<p>Today we attempt to determine <em>in the parser</em> whether an expression
may yield a usable value.  In the case of expressions that double as
statements, this is generally specified by the presence or absence of
a trailing semicolon.  In the case of blocks and other compound
statements, the presence or absence of a semicolon in the final
expression within the block is significant.  Therefore, we can have
something like:</p>

<pre><code>fn foo(...) -&gt; T {
    if expr1 {
          if expr2 { expr3 } else { expr4 }
    } else {
          expr5
    }
}
</code></pre>

<p>This is a function which returns a value.  This is very different from
the same code with semicolons:</p>

<pre><code>fn foo(...) {
    if expr1 {
          if expr2 { expr3; } else { expr4; };
    } else {
          expr5;
    }
}
</code></pre>

<p>This is code that executes but the result values are ignored.</p>

<p>This system doesn&#8217;t work so well with the syntactic sugar, as
<code>vec::iter(v) { |e| ... }</code> and <code>vec::map(v) { |e| ... }</code> both look the
same, but the latter produces a value.  Therefore, the parser is
unable to distinguish between them to decide whether the expression
produces a value or not.</p>

<p>This ambiguity only becomes significant if the sugared expression
appears at the top-level of a block (e.g., where it can be interpreted
as a statement).  Here we can distinguish between two cases:</p>

<h5>Case 1: In the middle of a block</h5>

<p>Consider a block like:</p>

<pre><code>{
    foo {|| ... }
    -10
}
</code></pre>

<p>How do I interpret this?  Based on the whitespace, it appears to
have been intended this way:</p>

<pre><code>{ foo({|| ... }); -10 }
</code></pre>

<p>But it could also be parsed this way:</p>

<pre><code>{ foo({|| ... }) - 10 }
</code></pre>

<p>I solved this with a simple rule: in a top-level expression, the
block sugar cannot be followed by binary operators, calls, fields,
etc.  Therefore, we would parse this block as two statements.  I and
others that I have asked find the other alternative (<code>foo {|| ...} -
10</code> as an expression) visually hard to parse.  If you want this, use
explicit parentheses.</p>

<h5>Case 2: Tail position in a block</h5>

<p>Does a block like <code>{ foo {|| ...} }</code> produce a value or not?  This is
trickier than the other option.</p>

<h4>Solutions</h4>

<p>I see four possible solutions to the &#8220;tail position&#8221; problem,
and I summarize them as follows:</p>

<ul>
<li><strong>Yes, it does produce a value.</strong></li>
<li><strong>No, it does not produce a value.</strong></li>
<li><strong>It depends on where the block appears.</strong></li>
<li><strong>The parser shouldn&#8217;t be doing this anyway.</strong></li>
</ul>


<p>Let me spell out these possible solutions in more detail.</p>

<h4>&#8220;Yes, it does produce a value.&#8221;</h4>

<p>This creates a distinction between loops like <code>while {...}</code> and loops
like <code>func(v) {...}</code>; while loops <em>always</em> have unit type but
<code>func(v) {...}</code> may not.  Today, however, that produces an error for a
snippet like this:</p>

<pre><code>while cond {
    vec::iter(v) { ... }
}
</code></pre>

<p>This is because the parser requires that while loop blocks do not have
an expression in tail position, and <code>vec::iter()</code> counts as such a
block.  We would therefore require a semicolon in such cases.  This
feels weird to me.</p>

<p>We can solve this by modifying the parser.  There are a few options,
but I think the most consistent overall is to permit expressions in
while loop bodies and elsewhere, but require them to have unit type.
This means that the above example would work, but a call to
<code>vec::any()</code> would not (it produces a boolean value) and neither would
a tail expression like <code>10</code> (it produces an int value).  Those would
require a trailing semicolon.</p>

<p><strong>UPDATE:</strong> This solution can lead to some complications.  Consider
code like the following:</p>

<pre><code>fn foo() {
    if cond {
        vec::iter(v) { ... }
    } else {
        vec::iter(v) { ... }
    }

    bar();
}
</code></pre>

<p>The first <code>if/else</code> now looks like an expression, because both blocks
produce a value (albeit a value of type unit).  In other words, the
<code>if/else</code> is classified the same as an <code>if/else</code> like <code>if cond { 10 }
else { 20 }</code> by the parser.  These &#8220;value-bearing&#8221; if/else expressions
require semicolons.  Therefore, the example doesn&#8217;t parse.  To solve
this, we say that <code>if/else</code>, <code>alt</code>, <code>do/while</code>, and standalone blocks
never require semicolons when used at top-level.</p>

<p>I find this more consistent anyhow.  Basically there is a category of
&#8220;dual-purpose&#8221; (statement and expression) forms.  These include
&#8220;keyword&#8221; expressions (<code>if</code>, <code>alt</code>, <code>while</code>, etc), standalone blocks,
and syntactic sugar calls.  If these dual-purpose expressions appear
at top-level, they are a statement.  Otherwise, they are an
expression.</p>

<h4>&#8220;No, it does not produce a value.&#8221;</h4>

<p>We could also say that a top-level expression never produces a value.
This feels consistent with the rule for top-level expressions that
appear in the middle of a block.  However, this solution disallows a
statement like:</p>

<pre><code>let w =
  if true { vec::any(abs_v) { |e| float::nonnegative(e) } }
  else { false };
</code></pre>

<p>Here, the call to <code>vec::any()</code> is clearly intended to be used an
expression, but the parser interprets it as a statement, and so we get
a type error <code>expected () but found bool</code>.  The problem here is
insufficient context: the <em>block itself</em> appears in an expression
position, so it seems reasonable that the tail position of such a
block be treated as an expression!  This leads us to our next
solution.</p>

<h4>&#8220;It depends on where the block appears.&#8221;</h4>

<p>We can distinguish in the parser between blocks that appear in an
expression position and those that do not.  The key problem here
becomes function items themselves.  For example:</p>

<pre><code>fn foo() -&gt; bool {
    ...
    vec::any(abs_v) { |e| float::nonnegative(e) }
}
</code></pre>

<p>Is this call to <code>vec::any()</code> an expression?  The block here appears as
the body of a function, so it&#8217;s a bit hard to say.  I would like the
answer to be yes.  We could achieve this by examining the return type
of the function being parsed: if it is unit (or unspecified, which
defaults to unit) then we can parse the function body as a block in
statement position.  This means that <code>foo()</code> above would parse (and
type check).  In <code>bar()</code>:</p>

<p>This works well for functions, but the parser doesn&#8217;t always have
enough context to make this decision:</p>

<pre><code>fn foo() {
    vec::iter(v) { |e|
        vec::any(abs_v) { |e| float::nonnegative(e) }
    }
}
</code></pre>

<p>Is this call to <code>vec::any()</code> intended as an expression?  As it
happens, <code>vec::iter()</code> expects a block argument with unit result type;
so to follow our context-sensitive rules, <code>vec::any()</code> must be a
statement, but of course the parser doesn&#8217;t know that, and so the user
would have to put a semicolon. This leads us to our <em>next</em> solution.</p>

<h4>&#8220;The parser shouldn&#8217;t be doing this anyway.&#8221;</h4>

<p>We can have the parser say that the last expression in a block is
always an expression and not a statement, and just have the type
checker perform the context-dependent reasoning: any time that a block
is checked in a context where a unit result is expected, the type of
the tail expression is ignored.  This makes all of our examples
work but is mildly more permissive than the language today.  For example,
this code would type check:</p>

<pre><code>fn foo() {
    10
}
</code></pre>

<p>This is because <code>foo()</code> has a unit result type, so the type of the
tail expression (<code>int</code>) is ignored.  Most likely the result type of
<code>foo()</code> was accidentally omitted.  I think this is acceptable, though,
because the user will notice that the return type of <code>foo()</code> is missing
when they try to call it elsewhere in the code.</p>

<p><strong>UPDATE:</strong> This also requires distinguishing &#8220;dual-purpose&#8221;
statements as described in the first solution (&#8220;yes, it does produce a
value&#8221;).  In fact, both the first and fourth solution are kind of the
same, but the fourth involves a bit more work in the type checker to
allow users to omit semicolons somewhat more often.</p>

<h4>So what should I do?</h4>

<p>I don&#8217;t know but I am leaning towards the first or final solutions, as
they seem the most consistent.  To be honest I am most concerned with
finding a rule that&#8217;s easy enough to explain and understand.  I don&#8217;t
want people to feel that the parsing rules are too confusing.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Tone and criticism]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/21/tone-and-criticism/"/>
    <updated>2011-12-21T17:02:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/21/tone-and-criticism</id>
    <content type="html"><![CDATA[<p>So, I worry that my various posts about Rust give the impression that
I&#8217;m dissatisfied with the language.  It&#8217;s true that there are several
things I&#8217;d like to change&#8212;and those are what I&#8217;ve focused on&#8212;but I
want to clarify that I quite like Rust the way it is and I find the
overall feel of the language to be very good.  When it comes to the
big decisions, I think Rust gets it right:</p>

<ul>
<li>Low-level control over how data is laid out in memory, but
high-level, expressive types like vectors, tuples, variadic types,
lightweight closures</li>
<li>Ability to use the stack, but safely:

<ul>
<li>Locals can be allocated on the stack</li>
<li>Blocks that directly reference the stack that created them
but do not leak</li>
</ul>
</li>
<li>Unique pointers for messaging (and possibly other things)</li>
<li>Immutable by default but mutable when desired</li>
<li>Static bindings for most calls</li>
<li>Lightweight tasks with cheap, growable stacks</li>
<li>A focus on type safety, and particularly on using types to achieve
goals beyond detecting typos, such as data-race freedom or
exhaustiveness checking</li>
</ul>


<p>None of these features are unique to Rust, but the <em>combination</em> is
new, and it&#8217;s powerful.  There are also plenty of small decisions I
think are fantastic:</p>

<ul>
<li>Crate files (an idea whose time had come) and the generally
simple command lines to invoke the compiler</li>
<li>Syntax: at first I thought <code>fn</code> and <code>ret</code> were overly terse,
but after working with them writing out <code>return</code> just seems so ponderous.
Similarly, parentheses-free syntax for <code>if</code> is surprisingly pleasant.</li>
<li>Unsigned types without implicit conversions</li>
</ul>


<p>I could go on but I guess that&#8217;s enough.  Anyway I think my point is
that I think Rust has the big picture down pat.  I would like to tweak
how it achieves some of those goals but even if my ideas never make it
or turn out to be flawed (as some of them no doubt are), Rust&#8217;ll be a
very nice language to use.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Dynamic race detection]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/20/dynamic-race-detection/"/>
    <updated>2011-12-20T11:19:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/20/dynamic-race-detection</id>
    <content type="html"><![CDATA[<p>In the context of thinking about parallelism for Rust, I have been reminded
of an older idea I had for a lightweight, predictable dynamic race
detection monitoring system based around block-scoped parallelism. I should
think this would be suitable for (an extended version of) a dynamic
language like Python, JavaScript, or Lua.  I will write in a Python-like
syntax since I know it best, but I am debating about exploring this
for JavaScript.</p>

<h4>Block-scoped parallelism</h4>

<p>The basic parallel model would be block-scoped parallelism as I&#8217;ve been
referring to in the various Rust posts.  To make this concrete, here is a
low-level example of how it might work; this is a dynamic version of
something like the <code>finally/async</code> blocks of X10. There are two basic concepts,
a <em>parallel region</em> and a set of <em>parallel tasks</em>.  Each task is created within
some parallel region; when the parallel region is <em>executed</em>, all of
the tasks begin exeution.  They may continue to create new tasks.  Meanwhile,
the task which executed the parallel region blocks until it completes.</p>

<pre><code>def divide_and_conquer(...):
    if "small enough to do sequentially":
        process(list)
    else:
        p = parallel_region()
        task1 = p.new_task(lambda: divide_and_conquer(...))
        task2 = p.new_task(lambda: divide_and_conquer(...))
        p.execute()
        task1.get_result() # yields an error unless p.execute() has run
</code></pre>

<p>I will leave aside for the moment the question of deciding which problems
are small enough to do sequentially and so forth.  This task and
parallel region API is intended to be low-level.  Higher-level libraries
would build on it to make those sorts of decisions.</p>

<p>Anyway the API is somewhat orthogonal.  We could dicker about the
precise design, I am more interested in the data-race freedom
part.  The key properties that the API must maintain are:</p>

<ul>
<li>Tasks are always created within a parallel region.</li>
<li>The parent task which executes the parallel region is always suspended
while its child tasks execute.</li>
</ul>


<h4>Monitoring for data races</h4>

<p>Now, the key idea: I want to say that each object is <em>owned</em> by the
task which creates it.  The data is then mutable only by its owner and
any parent task of the owner.  The data is <em>readable</em> by the owner and
any child task of the owner.  These constraints&#8212;assuming no
global data&#8212;suffice to guarantee that a given object cannot leak to
siblings of its owner.  Only the owner, ancesors of the owner, and
children of the owner can have access to the object.</p>

<p>This leaking property is important, so let&#8217;s take a second to see why
it&#8217;s true: the idea is that two sibling tasks can only communicate via
shared memory.  This memory must have existed when the tasks were
created, so it must be owned by a common parent of the sibling tasks.<br/>
This common memory is therefore immutable: only the common parent could
modify it, and the comment parent is suspended while its children
execute.</p>

<p>Based on this, we can say that whenever we modify a field of an object,
we must check that we are either the owner or a parent of the owner.
If we are a parent, then the owner must have terminated (else the
parent could be not be active), so we can adjust the owner of the object
to be ourselves. Thus the check for &#8220;do I have write access?&#8221; is kind of a
variant of Tarjan&#8217;s union set algorithm with chain compression.</p>

<p>Interestingly, no dynamic check at all is needed to do a read: either we
own the object, in which case we can read it, or it is owned by a parent
or extinct child.  In all of those cases reads are permitted.</p>

<p>I&#8217;ve phrased this write check as a per-object check, but I think that in
an actual implementation, I would really do it on groups of objects;
this would be implemented almost exactly like a write barrier in
a garbage collected environment, except that we have to not only write
out the dirty bit but read it first to check that we have write access.
Obviously this will be slower, but not that much slower I should
think, particularly as we are likely to be writing to recently
created objects, and hence have the dirty bit in cache.</p>

<h4>Ownership</h4>

<p>But sometimes we want to be able to pass mutable data to our children.
My idea for this is to use a kind of dynamic version of unique pointers.
In JavaScript, I would implement this with proxies: basically, you create
an object and when you create it, you say it is <em>mobile</em>.  What you get
back is a proxy to the real object, which is locked within the proxy and
never divulged.  Now you can use it like any object, but eventually you
can give it to a child. Your existing proxy is then set to some broken
state and a new proxy created which has the object. This proxy is handed by the
runtime to your newly spawned task. The task can always give it back to you by
returning it.</p>

<h4>What about other models?</h4>

<p>I showed this for a simple fork-join model.  I think you can rephrase it
in terms of futures or something like Dot Net&#8217;s parallel tasks as well.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Implementing unique closures]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/16/implementing-unique-closures/"/>
    <updated>2011-12-16T10:32:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/16/implementing-unique-closures</id>
    <content type="html"><![CDATA[<p>I landed a preliminary version of unique closures (which I am currently calling
sendable fns) on the trunk last night.  I wanted to briefly document what I did
to alter the design of closures to get this working (of course there is a comment
in the code too, but who reads that?).</p>

<p>Closures in Rust are represented as two words. The first is the function pointer
and the second is a pointer to the closure, which is the captured environment that
stores the data that was closed over.  Because of how Rust is implemented, the
closure must also store any type descriptors that were in scope at the point where
the closure was created.</p>

<p>Prior to my changes, the closure was represented by a structure roughly
equivalent to this C++ struct (actually I will use a hybrid of C++ and Java
syntax):</p>

<pre><code>struct closure&lt;class BD, unsigned n_tds&gt; {
    type_desc *bound_data_td;
    BD bound_data;
    type_desc *bound_tds[n_tds];
};
</code></pre>

<p>Here, the initial type descriptor <code>bound_data_td</code> is a type descriptor
that describes the struct <code>BD</code>.  It contains, among other things, the
size and alignment of <code>BD</code> etc, as well as pointers to the &#8220;take&#8221; and &#8220;drop&#8221;
functions, which copy and release the value respectively.</p>

<p>This layout had a few downsides.  One was that we could not load the type
descriptors from the <code>bound_tds</code> array without knowing the type of <code>bound_data</code>.
It&#8217;s possible, however, that we do not know the precise type of <code>bound_data</code>
for any specific closure instance.  The reason is that the closure might come
from a generic function, something like:</p>

<pre><code>fn make_closure&lt;copy T&gt;(t: T) -&gt; (lambda() -&gt; T) {
    ret lambda() -&gt; T { t };
}
</code></pre>

<p>Now, this closure is going to result in a bound_data that is <em>itself</em> generic!
It would look something like:</p>

<pre><code>struct make_closure_bound_data&lt;class T&gt; {
    T t;
};
</code></pre>

<p>The problem is that when we are generating code for this closure, we have to generate
one set of code that works for <em>any value of <code>T</code></em>. In other words, we know that
we have a closure whose type is something like <code>closure&lt;make_closure_bound_data&lt;?&gt;, 1&gt;</code>,
where <code>?</code> represents an unknown type.  Expanded that would look like:</p>

<pre><code>// closure&lt;make_closure_bound_data&lt;?&gt;, 1&gt;
struct make_closure_closure { 
    type_desc *bound_data_td;
    make_closure_bound_data&lt;?&gt; bound_data;
    type_desc *bound_tds[1];
};
</code></pre>

<p>The problem now is that because we do not know the precise type of <code>bound_data</code>,
we also do not know the offset of the field <code>bound_tds</code>!  The way we generally
handle this sort of situation is to use a type descriptor for the unknown type
<code>T</code> tells us how big a value of type <code>T</code> is and so forth.  Indeed, we have that
type descriptor, but it is stored in the <code>bound_tds</code>
array above.  But now you see a certain chicken-and-egg problem: to know how big the
bound data is, we have to load the type descriptor for <code>T</code>, but to load the type
descriptor, we have to know how big the bound data is.</p>

<p>The solution to this in the past was that we also had the <code>bound_data_td</code>, a type
descriptor for the entire set of bound data, and we could use its size field to
skip past the bound data to the type descriptor array.  But this was kludgy and
also dangerous: the code had to use a different code path (for various reasons
not worth getting into) than the other code that does these sort of dynamic
calculations, and I am not sure that it was correctly considering things like
alignment restrictions and so forth.</p>

<p>Therefore, I made some slight changes to the structure that eased these problems.
It is now represented like so:</p>

<pre><code>template&lt;class BD, unsigned n_tds&gt;
struct closure {
    type_desc *closure_td;
    type_desc *bound_tds[n_tds];
    BD bound_data;
};
</code></pre>

<p>There are two changes of note. First, the <code>closure_td</code> represents the <em>entire
closure</em> and not just the bound data.  Second, the set of bound type descriptors
is stored at a <em>statically known offset</em>, regardless of what data is closed over.
What&#8217;s more, while it may not be obvious at first, the offset of the bound_data
is also statically known: the reason is that when we are looking at one of these
structures, we know the number of bound type descriptors (i.e., we know <code>n_tds</code> in
the template above), and they are always of fixed size (a pointer). So that&#8217;s
good.</p>

<p>Now the other nice thing is that because <code>closure_td</code> represents the closure as a whole,
we can re-use the existing type copying routines.  Basically we are able to say:
&#8220;copy the object whose type is described by <code>closure_td</code>&#8221; and the code will do
the right thing.  If <code>closure_td</code> does not involve generic types, it will generate
purely static code, otherwise it will generate hybrid static/dynamic code.
So basically I use these standard routines to generate the glue functions that take
and drop a closure.</p>

<p>Now, if you have a sendable fn in your rust code and want to do a deep copy, you
can use these glue functions.  You may be wondering why have to use the glue functions,
why not just copy it directly? The reason is that when you have a sendable fn, you
don&#8217;t know the precise closure type: it&#8217;s equivalent to a type like <code>closure&lt;?,?&gt;</code>.
However, even that very imprecise type is enough to find the field <code>closure_td</code>,
which is always first, so we can copy any kind of closure by doing:</p>

<pre><code>closure&lt;?,?&gt; *copied_closure = closure;
closure-&gt;closure_td-&gt;take_glue(&amp;copied_closure);
</code></pre>

<p>(That is more-or-less the signature that our take functions have). There are a
few other minor changes, but that&#8217;s the gist of it.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Const vs Mutable]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/13/const-vs-mutable/"/>
    <updated>2011-12-13T19:41:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/13/const-vs-mutable</id>
    <content type="html"><![CDATA[<p>I keep thinking about parallel blocks although I know I probably
shouldn&#8217;t; but so long as I write these notes while rustc builds,
everybody wins, right?</p>

<p>Anyhow, pcwalton and <a href="http://calculist.org/">dherman</a> yesterday pointed out to me
that <code>const</code> is not exactly one of the most beloved features of C++:
&#8220;const-ification&#8221; is no fun, and if we&#8217;re not careful, Rust could walk
right down that path. To some extent my reaction is, &#8220;Well,
something&#8217;s gotta&#8217; give.&#8221;  You can&#8217;t have modular static race freedom
without some way to know what function will write what.  But
nonetheless they are quite correct.</p>

<p>I think the problem with <code>const</code> is that it&#8217;s not the default, despite
the fact that most operations are reads.  So perhaps we could make
<code>const</code> the default for Rust: I think this would fit with the general
philosophy of making mutation explicit. In practice what this would
mean is that declarations like <code>x: @T</code>, <code>&amp;&amp;x: T</code>, <code>x: [T]</code> would all
be declaring a pointer <code>x</code> to read-only (i.e., <code>const</code>) data.  This
would be a deep const.  A pointer of type <code>@mut T</code> would permit
modification of the data it points at. (I am not quite sure what to do
with <code>~</code>; since there is no fear of aliasing, permitting mutation
implicitly makes sense to me but it is somewhat inconsistent).</p>

<p>This makes subtle changes in the meaning of existing types.  For
example, <code>[T]</code> no longer means an immutable array but rather what
today would be written as <code>[const T]</code>.  We could add an explicit <code>imm</code>
qualifier if we wanted to allow for immutable data: it would be usable
not only on arrays but also on records and the like, so you could
write <code>[imm T]</code> or <code>@imm T</code>.  I won&#8217;t talk about <code>imm</code> further,
though; the details of its addition are left as an exercise to the
reader (always wanted to write that&#8230;).</p>

<p>An important difference between <code>mut</code> and the other qualifiers is that
<code>mut</code> is shallow where the others are deep.  For example, given a variable
<code>x: @mut T</code> where the type <code>T</code> is defined as:</p>

<pre><code>type T = {
    f: @U, g: @mut U
};
</code></pre>

<p><code>x.f</code> has type <code>@U</code> and <code>x.g</code> has type <code>@mut U</code>.  But if we have a
non-<code>mut</code> variable <code>y: @T</code>, then both <code>y.f</code> and <code>y.g</code> have type <code>@U</code>
(the <code>mut</code> on the field <code>g</code> is lost because <code>y</code> is a const pointer).</p>

<p>One final note: in comparison to C, I have only discussed the <code>mut</code>
qualifer in pointer types.  For example, <code>@mut T</code> and <code>[mut T]</code>.
Types like <code>const int</code> that you can find in C don&#8217;t make sense to me:
it&#8217;s a value, it can&#8217;t be modified, only overwritten.  That seems like
a property of the lvalue not the type.  But if you want to interpret
things in the C way, then you could say that types are mutable by
default, but pointers are const by default.  So the type <code>int</code> is a
mutable int (i.e., a variable which can be changed), but the type
<code>@int</code> is a const pointer to an int.  I don&#8217;t think this is a very
important point, personally; it sounds like it&#8217;s deep but I think it&#8217;s
not.</p>

<p>Ok, this is just a sketch, but it&#8217;s enough for me to remember what I
was thinking later.  I am pretty sure this actually hangs together
rather nicely (I thought about it a lot on the train and went through
various iterations).  I am not sure how it would feel to program in,
but I suspect it would be very nice. More to come in later posts I&#8217;m
sure.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Partially ordered unique closures]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/13/partially-ordered-unique-closures/"/>
    <updated>2011-12-13T10:09:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/13/partially-ordered-unique-closures</id>
    <content type="html"><![CDATA[<p>On a call with other Rust developers, I realized that I was thinking about
unique closures all wrong.  I had in mind a total ordering:</p>

<pre><code>fn[send] &lt;: fn &lt;: block
</code></pre>

<p>but of course this is not necessary.  What is desirable is a partial ordering:</p>

<pre><code>fn[send] &lt;: block
fn &lt;: block
</code></pre>

<p>just as <code>~</code> and <code>@</code> pointers can both be aliased using a reference.
Ironically, this is precisely what I proposed in my list of possible
solutions, but I did so using region terminology.  Embarrassingly
obvious, in retrospect, particularly as that was Graydon&#8217;s original
design I believe.  I think I got confused by the total ordering of
kinds into thinking that this should translate to a total ordering of
functions that close over data in those kinds.  Anyhow, I will now
work on implementing unique closures in this partially ordered way,
and hopefully things will go more smoothly!</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Challengines implementing unique closures]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2011/12/12/challengines-implementing-unique-closures/"/>
    <updated>2011-12-12T21:25:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2011/12/12/challengines-implementing-unique-closures</id>
    <content type="html"><![CDATA[<p><strong>Update:</strong> See the recent post addressing
<a href="http://smallcultfollowing.com/babysteps/blog/2011/12/13/partially-ordered-unique-closures/">the solution to this problem</a>.</p>

<p>I have been trying to implement unique closures&#8212;or sendable
functions, as I prefer to call them&#8212;but I realized that there is
a fundamental problem that I hadn&#8217;t thought of before.   The problem
stems from two contradictory design goals:</p>

<ul>
<li>Sendable functions should be movable to another task without copying</li>
<li>The various function types should have a subtyping relationship</li>
</ul>


<p>The first requirement really demands that the sendable function&#8217;s
environment be stored with a unique pointer.  Otherwise multiple
threads could share access to the same mutable state. Uncool.</p>

<p>The second requirement, however, demands that sendable functions
should have the same representation as our other closure types&#8212;that
is, they should be represented as the pair of a function pointer and a
boxed environment.</p>

<p>Clearly something has to give.  I see various options.</p>

<h4>Copy when sending to another task</h4>

<p>When sending a sendable function elsewhere, we could do a deep copy of
the closure contents. However, we would want to allocate this copy in
the target task&#8217;s heap.  brson pointed out that there is a potential
race condition of sorts, in that the target task might die before the
message is constructed and sent. And in any case it just feels weird
to have one task allocate in another task&#8217;s heap. The proper way to do
this kind of thing is to use the exchange heap, as we do with unique
pointers, but we can&#8217;t do that and preserve the subtyping
relationship. There is also the fact that I think it should <em>always</em>
be possible to send without copies, if you are willing to use unique
pointers.</p>

<h4>Remove the subtyping relationship</h4>

<p>If <code>fn[send]</code> was not a subtype of <code>fn</code>, then a lot of our problems go
away.  But if we down <em>that</em> route, then we end up with a lot of
different kinds of functions; we can no longer unify bare functions
and sendable functions, since bare functions <em>must</em> be usable as
closures.  I have a personal goal of keeping the number of function
types to three or less, analogous with <code>~</code>, <code>@</code>, and <code>&amp;</code>.</p>

<h4>Coroutines or procedures</h4>

<p>This is basically what I half-proposed in
<a href="http://smallcultfollowing.com/babysteps/blog/2011/12/06/coroutines-for-rust/">my previous post about procedures/coroutines</a>.  I like this
idea but it&#8217;s a bit of a departure from what we have; if we did it, I
would probably want to go &#8220;whole hog&#8221; and support coroutines, though I
think for 0.1 starting with one-shot procedures would be more
realistic.  One thing I like about it is that we can reduce down to
two function types: <code>fn(T)-&gt;U</code> and <code>block(T)-&gt;U</code>.  <code>fn(T)-&gt;U</code> would be
used for (what are today) bare functions and lambdas. <code>block(T)-&gt;U</code>
would be used for blocks and would also be compatible with <code>fn(T)-&gt;U</code>.</p>

<h4>Move the bound into the pointer type</h4>

<p>Using coroutines, we can get ourselves down to two function types.
But there is a way to get down to one. Currently, a function type is
actually a pair of the function pointer and the environment.  Another
option would be to make a closure be a single pointer to the
environment, and then to embed the function pointer into the
environment.  Thus the type for <em>all</em> closures would be <code>fn(T)-&gt;U</code>.
This would be a <a href="http://smallcultfollowing.com/babysteps/rust/no-implicit-copies#dynamically-sized-types">dynamically sized type</a> and hence subject to
various limitations; in particular, it could only be referenced by
pointer.  We could then use the type of the pointer to also encode the
bound:</p>

<ul>
<li>A unique function pointer <code>~fn(T)-&gt;U</code> can only close over sendable state.</li>
<li>A shared function pointer <code>@fn(T)-&gt;U</code> can close over task-local state.</li>
<li>A by-ref function pointer <code>&amp;fn(T)-&gt;U</code> can close over arbitrary state.</li>
</ul>


<p>Furthermore, the borrowing and aliasing rules would naturally allow
<code>~fn(T)-&gt;U</code> and <code>@fn(T)-&gt;U</code> to be used as <code>&amp;fn(T)-&gt;U</code>, but only for a
limited time, etc.</p>

<p>I personally think the system that results from this is easy to
explain.  However, it has some downsides:</p>

<ul>
<li>it relies on the <a href="http://smallcultfollowing.com/babysteps/rust/no-implicit-copies#dynamically-sized-types">no implicit copies (NIC) proposal</a> and
<a href="https://github.com/graydon/rust/wiki/Proposal-for-regions">regions</a>, neither of which have been accepted, much less
implemented;</li>
<li>it means that calling a closure requires first loading the function pointer
out of the environment and then calling indirectly, which is slower;</li>
<li>it merges the function bound and the kind of pointer used to access
the function.  I see this as a positive (one less concept to
explain) but others may disagree.  I do not believe any
expressiveness is lost by this approach, in any case.</li>
</ul>


<p>In any case, the fact that it relies on proposals that have not been
implemented and will not be implemented for 0.1 make it not a viable
option.</p>

<h4>None of the above</h4>

<p>The route I am currently looking into is much more conservative.  We
basically do not support unique closures or any other alternative for
0.1.  Instead, we address the particular pain point that started this
quest: generic bare functions.  Right now, generic functions take
implicit arguments called type descriptors, which are basically a form
of reflection. These type descriptors are always local to a thread.  I
am going to see how difficult it is to make them global. This would
allow generic bare functions to be bound to specific types but remain
bare functions. (If we do decide to &#8220;monomorphize&#8221;, as is under
discussion, then this becomes a non-issue, as type descriptors are no
longer needed.)</p>

<h4>And what of the future?</h4>

<p>Assuming we take the none of the above option, what about the next
version of Rust? Hard to say. Personally speaking, I find the first
two options to be non-starters for the reasons stated previously.  The
third option (coroutines) seems good to me if we can make them fast
enough to use for iterators (which I think we can).  The fourth option
(pointer type is bound) appeals to me as well because it brings us
down to <em>one type of function</em> (<code>fn(T)-&gt;U</code>) without any loss of
expressiveness.  We would probably want to see what performance impact
it has and if there are ways to mitigate it.  In any case, there is a
long road between here and the fourth option, since we have to
implement the <a href="http://smallcultfollowing.com/babysteps/rust/no-implicit-copies#dynamically-sized-types">NIC proposal</a> and fully spec out and implement
<a href="https://github.com/graydon/rust/wiki/Proposal-for-regions">region support</a>.  However, the third and fourth options are not
incompatible, so maybe we will see both someday.</p>
]]></content>
  </entry>
  
</feed>

