<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Baby Steps]]></title>
  <link href="http://smallcultfollowing.com/babysteps/atom.xml" rel="self"/>
  <link href="http://smallcultfollowing.com/babysteps/"/>
  <updated>2012-05-14T20:22:38-07:00</updated>
  <id>http://smallcultfollowing.com/babysteps/</id>
  <author>
    <name><![CDATA[Nicholas D. Matsakis]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[Vectors, slices, and functions, oh my!]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/05/14/vectors/"/>
    <updated>2012-05-14T06:36:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/05/14/vectors</id>
    <content type="html"><![CDATA[<p>I wanted to bring together the various ideas around vectors and
function types into one post.  The goals of these changes are</p>

<ol>
<li>to achieve orthogonality of the pointer types, so that leading <code>&amp;</code>,
<code>@</code>, and <code>~</code> sigils are the only way to indicate the kind of
pointer that is in use;</li>
<li>to help pare down on the proliferation of subtle variantions on
types, such as the 5 different function types currently available.</li>
</ol>


<h2>The proposal</h2>

<p>The Rust type system would be described by the following grammar.  In
this grammar, I have included all optional portions except for region
bounds.  I indicated those types which could have a lifetime bound
associated with them by writing <code>(/&amp;r)</code> in the description (a lifetime
bound indicates the lifetime of any pointers embedded within the type
itself; this is not related to the changes I am discussing here so I
won&#8217;t go into detail):</p>

<pre><code>M = mut | const | ""                  // immutable by default

K = send | copy | move | ""           // move by default

T = () | int | uint | float | ...     // scalar types
  | @MT | ~MT | &amp;r.MT | *MT           // pointer types
  | id&lt;T*&gt;                            // enum, resource, class (/&amp;r)
  | id                                // type variable
  | [T]                               // slice type (/&amp;r)
  | substr                            // string slice type (/&amp;r)
  | (T*)                              // tuple type
  | (T "*" N)                         // fixed-size array
  | {(M id: T)*}                      // anonymous record types
  | U                                 // dynamically sized types

U = fn:K(T*) -&gt; T                     // closure types (/&amp;r)
  | id:K&lt;T*&gt;                          // iface instances (/&amp;r)
  | vec&lt;MT&gt;                           // vector type
  | str                               // string type
</code></pre>

<h2>Dynamically sized types</h2>

<p>The types described by <code>U</code> are separated out because, unlike the other
types listed, they have &#8220;dynamic size&#8221;&#8212;that is, the size of an
instance of <code>U</code> will vary from instance to instance.  As a result, the
<code>U</code> types are somewhat &#8220;second-class&#8221; when compared to the other
types:</p>

<ul>
<li>Type variables cannot be bound to dynamically sized types.</li>
<li>Expressions whose type is a dynamically sized type are generally prohibited.</li>
</ul>


<p>There is one exception to these rules.  Literal expressions of
dynamically sized types are permitted, as the compiler can readily
compute their size.  The literal forms of the various types <code>U</code> are:</p>

<pre><code>Type            Literal form
----            ------------
fn:K(T*) -&gt; T   fn:K(x, y) -&gt; T { ... }
fn:K(T*) -&gt; T   id (where id is a fn item)
id:K&lt;T*&gt;        iface(v)
vec&lt;MT&gt;         [M ...]
str             "..."
</code></pre>

<p>If it seems useful, we could lift the restriction that type variables
cannot be bound to dynamically sized types and instead use some sort
of kind to mark variables that may accept dynamically sized types (or
to mark those that may not, depending on what we feel the defaults
ought to be).</p>

<h2>Vectors and slices</h2>

<p>These basically work in the same way as currently proposed, but the
syntax has changed.  A vector is written <code>vec&lt;T&gt;</code>; note that unlike
other types, vector type parameters have a mutability, so you might
have <code>vec&lt;mut u8&gt;</code>, for example.</p>

<p>Slices can be created by using the <code>[:]</code> operator, which works just as
in Python.  So <code>[1, 2, 3][-1:]</code> returns a slice containing <code>[3]</code>,
<code>[1, 2, 3][1:-1]</code> returns <code>[2]</code> and <code>[1, 2, 3][2:1]</code> returns an empty
slice.  The slice operator can be applied to both vectors and slices.
We could conceivably allow it to be overloaded.</p>

<p>Vectors may be added to slices.  The type of the resulting vector is
taken from the left-hand side.  So adding a <code>@vec&lt;mut u8&gt;</code> to a <code>[u8]</code>
yields a (longer) <code>@vec&lt;mut u8&gt;</code>.</p>

<h2>Fixed-length arrays</h2>

<p>The type <code>(T * N)</code> represents a fixed-length array.  Here <code>T</code> is
another type and <code>N</code> is a constant expression.  This is primarily
intended for C compatibility: a fixed-length array has no length field
and is simply represented by <code>N</code> instances of the type <code>T</code> laid out
one after the other.  In most ways it is precisely equivalent to a
tuple.  There is no literal form for such arrays: they are in fact
supertypes of tuples of equivalent size, and so share the tuple syntax
<code>(v1, ..., vN)</code>.  We can introduce a macro for repeating the same
element <code>N</code> times to avoid repetition.</p>

<p>Fixed-length arrays are indexable and sliceable but their contents are
not modifiable.  If modification is desired one can create a simple
one-entry record.</p>

<p><em>Note:</em> I think this idea of having fixed-length arrays and tuples be
closely related makes sense.  I&#8217;m mostly trying to keep things simple
and not introduce too much machinery for an edge case.  But maybe
there is a problem with it.</p>

<h2>Closure and iface instance bounds</h2>

<p>Both the closure and iface instance types feature a bound <code>K</code> called a
&#8220;kind bound&#8221;.  These types are unlike the other types because they are
&#8220;opaque&#8221; to the compiler: that is, the compiler does not know the
types of the data that is contained within.  The bound <code>K</code> puts a
restriction on those types so that additional operations can be
permitted.</p>

<p>For example, if you have a closure <code>x</code> of type <code>fn:send()</code>, then the
compiler knows that whatever data is closed over by <code>x</code> is sendable.
The compiler can therefore permit <code>x</code> to be sent between tasks.
Similarly the iface type <code>to_str:send</code> describes an instance of some
type which is sendable and implements the <code>to_str</code> interface.</p>

<p>If no bound is specified, the default is <code>move</code> (which is the most
general).  This simply states that the closure may close over
arbitrary data.</p>

<p>As today, the &#8220;sugared closure&#8221; form <code>{|x| ...}</code> would be inferred to
some form of &#8220;pointer to closure&#8221;.  That is, it could result in <code>@fn</code>,
<code>~fn</code> or <code>&amp;fn</code>, depending on the expected type.</p>

<p>There is no type that represents a &#8220;closure that accesses variables by
reference and not by copy&#8221;.  Sugared closures become the only way to
construct such a closure: if the expected type is <code>&amp;fn()</code> they will
construct a &#8220;access-by-reference&#8221; closure and the if the expected type
is <code>@fn()</code> or <code>~fn()</code> they will not.  This seems non-ideal but
equivalent to the situation today.</p>

<h3>Bare functions</h3>

<p>The type of &#8220;bare functions&#8221; (that is, function items which do not
close over anything) is simply <code>fn:send(T*) -&gt; T</code>.  To use a bare
function as a closure, you must prefix the bare function with an
appropriate sigil (<code>&amp;</code>, <code>@</code>, or <code>~</code>).</p>

<p>For example, the following snippet uses a function <code>inc()</code> as the
argument to <code>vec::map</code>:</p>

<pre><code>fn inc(x: int) -&gt; int { x + 1 }

fn inc_all(vs: [int]) -&gt; [int] {
     vec::map(vs, &amp;inc)
}
</code></pre>

<p>Here the expression <code>&amp;inc</code> has the (fully elaborated) type
<code>&amp;static.fn:send(int) -&gt; int</code>.  This is a subtype of the expected type
that <code>vec::map</code> requires: <code>&amp;fn:move(int) -&gt; int</code>.</p>

<p>To send a bare function between tasks you might write:</p>

<pre><code>fn task_body() { ... }

fn spawn_task() {
    task::spawn(~task_body)
}
</code></pre>

<h3>Representing closures</h3>

<p>The representation of closures will change somewhat.  Before a closure
was the pair of a function pointer with an environment pointer.  Now a
closure will be a pointer to a structure like:</p>

<pre><code>struct closure {
    void *fptr,
    type_desc *td,
    ... // (closed over data)
};
</code></pre>

<p>I thought at first that LLVM might not be able to track a function
pointer in this case, but experiments suggest that it can.  In
general, LLVM does a good job of tracking and constant propagating
alloca&#8217;d memory with precision.</p>

<p>Conceivably, it might be slower to perform an extra load before the
call (that is, to call <code>c-&gt;fptr</code> and not <code>c.fptr</code>) if the closure
pointer is not in cache.  But reasoning about the cache without
experimenting is always risky, and this particular load seems unlikely
to matter, as the closure will soon be accessing its environment, and
in that case you&#8217;d have to bring the environment into cache anyhow.</p>

<p>There is one unambiguous, if minor, downside.  In the old scheme, bare
functions used as a closure of type <code>fn@</code> or <code>fn~</code> could pair the
function pointer with a NULL environment.  But in this new scheme a
<code>@fn</code> or <code>~fn</code> will require allocation, because the runtime will
expect to be able to free such pointers like any other pointer.  Such
pointers are quite rare though compared to <code>&amp;fn</code> types, and something
like <code>&amp;inc</code> can be pre-allocated statically (actually we could use
tagged pointer tricks, I suppose, but it doens&#8217;t seem worthwhile).</p>

<h2>Pros and cons</h2>

<p>To me the real question is whether the system feels simpler on net
given the introduction of dynamic size types.  I think it does, but
obviously this is a subjective question.  To me, the benefits of
the following are pretty substantial:</p>

<ul>
<li>one function type instead of five;</li>
<li>types like <code>@fn(int) -&gt; uint</code> and <code>@vec&lt;int&gt;</code> seem to have a clear
meaning once you are accustomed to <code>@</code> meaning pointer, vs <code>fn@(int)
-&gt; uint</code> and <code>[int]/@</code>;</li>
<li>the difference between vectors and slices (and strings and substrings)
is clear, currently I think there is plenty of room for confusion.</li>
</ul>

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Iface types]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/05/08/iface-types/"/>
    <updated>2012-05-08T09:03:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/05/08/iface-types</id>
    <content type="html"><![CDATA[<p>Yesterday I wrote about my scheme for paring down our set of function
types to one type, <code>fn:kind(S) -&gt; T</code>.  When I finished writing the
post, I was feeling somewhat uncertain about the merits of the idea,
but I&#8217;m feeling somewhat better about it today.  I really like the
idea that top-level items have the type <code>fn:kind(S) -&gt; T</code> and that you
therefore give them an explicit sigil to use them in an expression;
this allows us to remove the &#8220;bare function&#8221; type altogether without
any complex hacks in the inference scheme.</p>

<p>Anyway, I didn&#8217;t talk at all about iface types yesterday, but they
have a place in this scheme too.  An iface type, also called a boxed
iface, is basically the pair of a vtable with a <code>self</code> pointer.  Today
this is hard-coded to be a GC&#8217;d ptr (<code>@</code>), but I want to change this
as it is very limited: iface types are relatively expensive to
construct (requiring allocation, RC overhead, etc) and they cannot be
sent between tasks.</p>

<p>Under my proposal, an iface type would be written <code>id:kind</code> where <code>id</code>
is the name of the interface and <code>kind</code> is an optional kind bound that
applies to the receiver.  The type is dynamically sized, because the
value that is represented is something like:</p>

<pre><code>struct iface_instance {
    void *vtable;
    type_desc *td;
    ... // self data is represented inline
}
</code></pre>

<p>This proposal therefore allows you to construct things like:</p>

<pre><code>@id      (today's "boxed iface")
&amp;id      (an iface instance allocated on the stack)
~id:send (a sendable iface instance)
</code></pre>

<h4>New interface instance construction syntax</h4>

<p>There is one other change I&#8217;d like to make, which is independent but
seems to fit.  Today, iface types are constructed using <code>as</code>.  I am
not crazy about this because <code>as</code> is normally our type cast operator,
but iface type construction is not a type cast.  It may perform
allocation etc.  The <code>as</code> construction is also very wordy, requiring
one to specify the desired iface rather than having it inferred, and
it has an awkward requirement that <code>::</code> be used for any type
parameters on the type.</p>

<p>As a replacement I propose we make use of the <code>iface</code> keyword in
expressions, so that <code>iface::&lt;T&gt;(v)</code> would construct an instance of
the iface type <code>T</code> for the value <code>v</code>.  Like all type parameters, <code>T</code>
may be left off and inferred from context.  So typically you would
just write <code>iface(v)</code>, as in this example (here I assume the current
iface types, rather than the ones I will describe shortly):</p>

<pre><code>iface an_iface&lt;T&gt; { ... }
impl of an_iface&lt;int&gt; for int { ... }
fn foo(i: an_iface&lt;int&gt;&gt;) { ... }
fn bar(i: int) { foo(iface(i)) {
</code></pre>

<p>In contrast, the fn <code>bar</code> in the old syntax looks like:</p>

<pre><code>fn bar(i: int) { foo(i as an_iface::&lt;int&gt;) }
</code></pre>

<p>At first I wanted to make ifaces into constructor functions with a signature
like:</p>

<pre><code>fn an_iface&lt;I:an_iface&gt;(i: I) -&gt; an_iface
</code></pre>

<p>but this doesn&#8217;t fit with my proposal above, as if the type an_iface
is a type of dynamic size, as it cannot be returned (also, how does
one specify the sendability bounds?  They would have to be added as
bounds to the type <code>I</code>, etc)</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Fn types]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/05/07/fn-types/"/>
    <updated>2012-05-07T16:27:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/05/07/fn-types</id>
    <content type="html"><![CDATA[<p>As you loyal readers know, I am on a quest to make the Rust type
system more <em>orthogonal</em> with respect to the kind of pointer in use,
by which I mean that I want to have the three pointer sigils (<code>@</code>,
<code>&amp;</code>, and <code>~</code>) indicate where memory is located and the other types
indicate what value is to be found at that memory.  Right now there
are a few cases where we conflate the two things into one type.  The
first, vectors and slices, I discused in a recent post.  This post
discusses the second case: function and interface types.</p>

<p>I&#8217;ll sketch out the idea; there are however some details I have yet to
work out.  Actually the plan imposes some downsides and I&#8217;m not 100%
sure if I&#8217;m in favor yet.  Though I think the simplicity of the type
system is a win (simplicity for people using it, that is).</p>

<h2>Background</h2>

<p>We currently have five function types:</p>

<ul>
<li><code>fn@(S) -&gt; T</code>, or &#8220;boxed closure&#8221;;</li>
<li><code>fn~(S) -&gt; T</code>, or &#8220;unique closure&#8221;;</li>
<li><code>fn&amp;(S) -&gt; T</code>, or &#8220;stack closure&#8221;;</li>
<li><code>fn(S) -&gt; T</code>, or &#8220;any closure&#8221;;</li>
<li><code>native fn(S) -&gt; T</code>, or &#8220;bare function&#8221;.</li>
</ul>


<p>What distinguishes these closures is both the kind of pointer in which
their environment is stored as well as the kind of data which can be
stored in the closure itself:</p>

<ul>
<li>The boxed closure (<code>fn@</code>) uses a normal GC&#8217;d pointer (oft called a
boxed pointer, but I am trying to move to more descriptive
terminology) to store its environment. The environment may contain
arbitrary values but it does not contain any references to the
stack.  Copying a boxed closure is cheap because the environment can
be aliased (currently, this means that the environment is ref
counted).</li>
<li>The unique closure (<code>fn~</code>) uses a unique pointer to store its
environment.  All data must be <em>sendable</em>, which basically means
tree-shaped.  Copying a unique closure is expensive, because the
environment cannot be aliased, and so copying the closure results in
a deep copy of all the closed-over data.  This has the upside that
unique closures can be sent between tasks.</li>
<li>The stack closure (<code>fn&amp;</code>) uses a reference to store its environment.
Unlike the other closures, no data is stored in the environment
itself.  Rather, the environment consists of pointers to an outer
stack frame.  Copying a stack closure is <em>very</em> cheap.  Because
<code>fn&amp;</code> contains by-ref pointers into its parent stack frame, it is
restricted in where it can appear (though these restrictions can
probably be loosened some the work on lifetimes is complete).</li>
<li>The any closure (<code>fn</code>) is just the supertype of the other three.  It
commonly appears in function signatures where any sort of closure
would be acceptable (<code>vec::each()</code>, for example).</li>
<li>A bare function (<code>native fn</code>) is just a function pointer with no
environment at all (it is therefore not a closure at all).  It is
never used in practice, as it is in fact a subtype of the various
closure types.</li>
</ul>


<p>One important detail is that closures are represented as the pair of a
function pointer along with a pointer to the environment.  In the case
of a bare function, the pointer to the environment is always <code>NULL</code>.</p>

<h2>The proposal</h2>

<p>I want to have just one function type.  In practice, as today, this
would most commonly be written as <code>fn(S) -&gt; T</code>, but the type in its
fully explicit glory would be:</p>

<pre><code>fn:kind(S) -&gt; T
</code></pre>

<p>Like my proposed vector type <code>vec&lt;T&gt;</code>, this function type has an
unknown static size.  At runtime, it would be represented by a
structure like:</p>

<pre><code>struct fn {
    void *code_ptr;
    ... // environment data is stored inline
};
</code></pre>

<p>As with <code>vec&lt;T&gt;</code>, the type <code>fn(S) -&gt; T</code> has an unknown size and
therefore must basically always be referenced by pointer (<code>@fn(S) -&gt;
T</code>, <code>~fn(S) -&gt; T</code>, <code>&amp;fn(S) -&gt; T</code>).</p>

<p>The <code>kind</code> portion of the function type indicates a bound on the
closed over data.  It can by <code>copy</code> or <code>send</code>.  If it is omitted, then
there is no bound.  In practice, I imagine that <code>send</code> is the only
kind that would ever be useful.</p>

<p>The mapping between the current function types and my proposal would be:</p>

<pre><code>fn@(S) -&gt; T         becomes        @fn(S) -&gt; T
fn~(S) -&gt; T         becomes        ~fn:send(S) -&gt; T
fn&amp;(S) -&gt; T         becomes        &amp;fn(S) -&gt; T
fn(S) -&gt; T          becomes        &amp;fn(S) -&gt; T
native fn(S) -&gt; T   just goes away
</code></pre>

<h2>Details</h2>

<h3>Literal syntax</h3>

<p>I can imagine a couple of alternatives here.  The basic issue is
distinguish between &#8220;closures that reference things on the stack
frame&#8221; and &#8220;closures that copy things out of the stack frame&#8221;.</p>

<p>I think my preferred solution is to say that the explicit <code>fn</code> form
<em>always</em> copies out of the stack frame.  So something like:</p>

<pre><code>let foo = fn@(x: int, y: int) -&gt; int { x + y };
</code></pre>

<p>would become</p>

<pre><code>let foo = @fn(x: int, y: int) -&gt; int { x + y };
</code></pre>

<p>Note that there is no bound specified (indicating no bound on the
closed-over data).  Of course, any data that gets copied into the
closure must be copyable; but if data is <em>moved</em> into the closure (for
example, if it is the last use of the data, or an explicit capture
clause is used), then the data can have any kind.  This is the same as
today.</p>

<p>If we wanted a unique closure, which today is written:</p>

<pre><code>let foo = fn~(x: int, y: int) -&gt; int { x + y };
</code></pre>

<p>you would write</p>

<pre><code>let foo = ~fn:send(x: int, y: int) -&gt; int { x + y };
</code></pre>

<p>This is somewhat wordier than today, but the truth is that we rarely
(if ever) write unique closures by hand.  Instead, you employ the
sugared closure form.</p>

<h3>Sugared closures</h3>

<p>Sugared closures are written using a Ruby-like notation:</p>

<pre><code>for vec.each { |item| ... }    // inferred to stack closure

task::spawn {|| ... }          // inferred to unique closure
</code></pre>

<p>They would continue to operate as today.  This means that we&#8217;ll infer
the kind of closure pointer and other facts based on the expected
type.  I am still not crazy about this inference (although I put it
in) but the last time I proposed taking it out this was unpopular.</p>

<h3>Bare functions</h3>

<p>One advantage of the current approach is that a bare function (which
is just a function pointer) can be converted into a closure by pairing
it with a null pointer.  This no longer works under this system except
for region pointers, so when bare functions are converted to <code>@</code> or
<code>~</code> functions we&#8217;d have to allocate a little stub to convert the call.
Maybe the sigil should be written explicitly, so that function items
have a type of <code>fn(S) -&gt; T</code>.  You would then write <code>vec::iter(v,
&amp;foo)</code> to apply the top-level function <code>foo</code> to each item in the
vector, for example.  Hmm.</p>

<h2>Summary</h2>

<p>So yeah, that&#8217;s the rough idea.  I feel like the current system does
work more smoothly in some regards, so I&#8217;m not yet sure if the idea is
overall a win, but I wanted to note it down.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Borrowing errors]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/05/05/borrowing-errors/"/>
    <updated>2012-05-05T05:37:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/05/05/borrowing-errors</id>
    <content type="html"><![CDATA[<p>I implemented a simple, non-flow-sensitive version of the reference
checker which I described in <a href="http://smallcultfollowing.com/babysteps/blog/2012/05/01/borrowing">my previous post</a>.  Of course it
does not accept the Rust codebase; however, the lack of
flow-sensitivity is not the problem, but rather our extensive use of
unique vectors.  I thought I&#8217;d write a post first showing the problem
that you run into and then the various options for solving it.</p>

<h2>Errors</h2>

<p>The single most common error involves <code>vec:len()</code>.  There are many
variations, but mostly it boils down to code code like this, taken
from the <code>io</code> package:</p>

<pre><code>type mem_buffer = @{mut buf: [mut u8],
                    mut pos: uint};

impl of writer for mem_buffer {
    fn write(v: [const u8]/&amp;) {
        if self.pos == vec::len(self.buf) { ... }
        ...
    }
}
</code></pre>

<p>The problem lies in <code>vec::len(self.buf)</code>.  This is considered illegal
because the vector <code>self.buf</code> resides in a mutable field of a
task-local box.  Therefore, the algorithm assumes that <code>vec::len()</code>
may have access to it and could, potentially, mutate it, which would
cause the vector to be freed.  Bad.  This call would be fine if the
field <code>buf</code> were not mutable.  In that case, even if <code>vec::len()</code> had
access to the <code>mem_buffer</code>, it could not be able to overwrite the
field.</p>

<p>In fact, <em>all</em> of the errors I see right now (about 46 of them across
the standard library and <code>rustc</code>) are calls to <code>vec::len()</code> or
<code>vec::each()</code> with the vector in question living in mutable, aliasable
memory.  It is, currently, the only way to accumulate items in a
vector, after all.  However, I haven&#8217;t implemented the full check&#8212;in
particular, I didn&#8217;t implement the check that pattern matching a
variant or through a box requires immutable memory, and so I imagine
there will be some more errors related to that once I do that.</p>

<h2>Solution #1: Swapping</h2>

<p>Of course, this problem is not really a surprise.  The solution I had
in mind for handling unique data that is located in mutable, aliasable
memory is to swap that unique data into your stack frame, where the
compiler can track it (inspired by
<a href="http://lampwww.epfl.ch/~phaller/capabilities.html">Haller and Odersky&#8217;s work on uniqueness</a>, though I&#8217;m sure the
technique predates them).  So the code from the <code>io</code> package could be
rewritten as:</p>

<pre><code>type mem_buffer = @{mut buf: [mut u8],
                    mut pos: uint};

impl of writer for mem_buffer {
    fn write(v: [const u8]/&amp;) {
        let mut buf = [];
        buf &lt;-&gt; self.buf;
        if self.pos == vec::len(buf) { ... }
        ...
        self.buf &lt;- buf;
    }
}
</code></pre>

<p>This makes use of the little known swap (<code>&lt;-&gt;</code>) and move (<code>&lt;-</code>)
assignment forms.  Now the buffer being passed to <code>vec::len()</code> is in
the local variable <code>buf</code>, not the contents of some <code>@</code> box; this means
that <code>vec::len()</code> could not possily reassign it because there are no
aliases to the local variable <code>buf</code>.</p>

<p>It&#8217;s a bit of a pain to write this swapping code each time.  It could
of course be packaged up in a library (here, I&#8217;ve included various
mode declarations, though these would be unnecessary in a purely
region-ified world, as ownership would be done by default):</p>

<pre><code>type swappable&lt;T&gt; = {mut val: option&lt;T&gt;};
impl methods&lt;T&gt; for swappable&lt;T&gt; {
    fn swap(f: fn(+T) -&gt; T) {
        let mut v = none;
        v &lt;-&gt; self.buf;
        if v.is_none() { fail "already swapped"; }
        self.val &lt;- some(f(option::unwrap(v)));
    }
}
</code></pre>

<p>Swappable could then be used to build up a dynamically growable
vector library:</p>

<pre><code>type dvec&lt;T&gt; = {buf: swappable&lt;T&gt;};
impl methods&lt;T&gt; for dvec&lt;T&gt; {
    fn add(+e: T) {
        self.buf.swap { |v| v + [e] }
    }

    fn add_all(v2: [T]) {
        self.buf.swap { |v| v + v2 }
    }

    fn each(f: fn(T) -&gt; bool) {
        self.buf.swap { |v| vec::each(v, f); v }
    }
}
</code></pre>

<p>Attempts to add to a vector that is being iterated over would fail
dynamically (basically a more reliable version of Java&#8217;s
<a href="http://stackoverflow.com/questions/4479554/why-vector-methods-iterator-and-listiterator-are-fail-fast">&#8220;fail-fast iterators&#8221;</a>).</p>

<h2>Solution #2: Pure functions&#8230;?</h2>

<p>Still, it&#8217;d be nice if one could invoke <code>vec::len()</code> and <code>vec::each()</code>
even when the data is in a mutable location.  After all, neither of
those functions make any changes, and we know that.  One solution I
considered was that we could make use of the <code>pure</code> annotation in a
kind of lightweight effect system.</p>

<p>The basic idea would be that <code>pure</code> functions are functions which do
not modify any aliasable state (today pure functions disallow mutation
of <em>any</em> state, including data interior to the stack frame; we should
<a href="https://github.com/mozilla/rust/issues/1422">fix this regardless</a>). However, drawing on
<a href="http://infoscience.epfl.ch/record/175240/files/ecoop_1.pdf">more work by the Scala folks</a>, we can actually generalize pure
functions somewhat farther: we could allow them to invoke closures so
long as those closures are given in the arguments.  The idea is
basically that a pure function is one which does not make any
modifications to aliasable state <em>except possibly through closures
which the caller itself provided</em>.</p>

<p>These changes would allow us to declare <code>vec::len()</code> and <code>vec::each()</code>
as pure.  In the case of <code>vec::len()</code>, that would be sufficient to
ensure safety without any form of alias check.  Horray!</p>

<p>But don&#8217;t get too excited: even if <code>vec::each()</code> is declared pure, we
still cannot accept calls like the ones we saw before:</p>

<pre><code>vec::each(self.buf) { |e|
    ...
}
</code></pre>

<p>The reason is that <code>buf</code> is still stored in aliasable, mutable state,
and so we have to be sure that the loop body is safe.  This can be
achieved when the vector is stored in a local variable, as we can
monitor for writes to that variable.  But if the vector is in an <code>@</code>
box, we have to consider any possible alias of that box.  And this
leads us to our next possible solution, alias analysis.</p>

<h2>Solution #3: Alias analysis</h2>

<p>As I said in my <a href="http://smallcultfollowing.com/babysteps/blog/2012/05/01/borrowing">previous post</a>, I am not 100% sure of what analysis
we are doing today.  But if I were to design an alias-based analysis to
address this shortcoming, I imagine if would work something like this:</p>

<ul>
<li>Each callee is guaranteed that every reference is stable (points at
memory which will not be freed) no matter what actions is takes.
This means that the callee is free to call any functions it likes,
including closures, because the caller has guaranteed that all
functions which the callee has access to are harmless.</li>
</ul>


<p>In particular, <code>vec::each(v, f)</code> could safely invoke the <code>f()</code> on each
item in <code>v</code> without fear of <code>v</code> being freed.  It&#8217;s up to the caller to
guarantee that <code>f</code> will not have any harmful effects.</p>

<p>But how can the caller do this?  There are two basic techniques.  The
first is to rely on the guarantees it gets from the outside.  So, if
you have a function like:</p>

<pre><code>fn map&lt;T,U&gt;(v: [T]/&amp;, m: fn(T) -&gt; U) -&gt; [U] {
    let mut r = [];
    for vec::each(v) { |e|
        r += m(e);
    }
    ret r;
}
</code></pre>

<p>Here, the call to <code>m()</code> is known to be safe because both <code>v</code> and <code>m</code>
were given as parameters, so it actually the job of the caller of
<code>map()</code> to ensure that they do not conflict.</p>

<p>If we can&#8217;t rely on a guarantee from the outside, then, we have to look
at the types.  For example, going back to our example of the buffer, if
we had a loop like:</p>

<pre><code>for self.buf.each { |e|
    some_ptr.buf = [];
}
</code></pre>

<p>Here the assignment to <code>some_ptr.buf</code> would be disallowed if
<code>some_ptr</code> had the same type as <code>self</code>: after all, maybe it is an
alias of <code>self</code>.</p>

<p>We can apply similar reasoning to functions that are invoked:</p>

<pre><code>for self.buf.each { |e|
    clear_buf(some_ptr);
}
</code></pre>

<p>Without knowing what <code>set_buf()</code> does, we&#8217;d have to reject this
because it has access to data of the same type as <code>self</code> (and hence,
potentially to <code>self</code> itself).</p>

<p>The cool thing about an analysis like this is that it would allow most
of the examples in the standard library to compile mostly as is.  But
there are some downsides.</p>

<p>First, it&#8217;s not clear to me that an analysis like this &#8220;scales well&#8221;.
By scales well I do not mean performance but rather that, while
library code tends to pass, I am not sure that uses of library code
will pass.  For example, suppose I have a shared, growable vector that
encapsulates a unique pointer, rather like Java&#8217;s <code>ArrayBuffer</code>.  And
now I have some library code that does:</p>

<pre><code> my_vec.each { |e| do_some_processing(e); }
</code></pre>

<p>where <code>my_vec</code> is one of these array buffers.  Using an alias check,
it is possible to define the <code>ArrayBuffer.each()</code> method, but that
essentially pushes the requirement to the caller to validate that the
body of the <code>each()</code> loop will not modify <code>my_vec</code>.  Since <code>my_vec</code> is
aliasable, this means that <code>do_some_processing()</code> must not use any
array buffers of its own.</p>

<p>Admittedly, we haven&#8217;t run into these scaling problems so much, but I
am not sure how much to draw from that.  For one thing, the analysis
is buggy today, so it may be that we should be seeing more errors than
we are.  For another, all vectors are unique now, but this is causing
us scaling problems, and we are starting to move away from that.</p>

<p>A second concern about the analysis is that it is anti-encapsulation.
It requires the compiler to have full details about the types of all
data that may be accessed.  When you have types like closures or
interfaces types, this information is not available, and so the more
we use these abstractions, the worse the analysis performs.
Furthermore, it becomes impossible for modules to &#8220;hide&#8221; the
implementation of a type&#8212;whenever any type definition anywhere
changes, all downstream code must be recompiled or else the memory
safety can no longer be guaranteed.  Admittedly, due to Rust&#8217;s support
for interior types (not everything is a pointer) and inlining, this is
already often the case, but it should still be possible to define
modules that make use of opaque pointer types in the future, allowing
for changes to the implementation where no recompile is necessary.</p>

<p><strong>UPDATE:</strong> A further thought on this matter.  This is a bit different
from requiring recompilation as a matter of course (e.g., because the
size of a record changed)&#8212;that is, there is no guarantee that the
downstream compilation will succeed.  Now, if I add a use of some
vector library in the upstream code, downstream code may fail to
compile, even if the use of the vector library is purely internal and
not exposed through the interface.</p>

<h2>Summary</h2>

<p>I am still leaning towards solution #1, though I appreciate our alias
analysis more and more.  Actually, the fact that I only encountered 46
errors seems pretty decent, especially since most of them are
clustered together.  However, I do expect more such errors when I
implement the pattern matching safety checks, but there we can make
better use of fine-grained copies and so I expect that to be less of a
problem.</p>

<p>Oh, and a final note regarding flow sensitivity: I think I will
implement a flow-sensitive variant of the checker (it&#8217;s a small change
from what I have today), but since we never take the address of locals
today, it&#8217;s a moot point anyhow.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Borrowing]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/05/01/borrowing/"/>
    <updated>2012-05-01T19:53:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/05/01/borrowing</id>
    <content type="html"><![CDATA[<p>I&#8217;ve been working for the last few days on the proper safety
conditions for borrowing.  I am coming into a situation where I am not
sure what would be the best approach.  The question boils down to how
coarse-grained and approximate our algorithm ought to be: in
particular, ought it to be flow sensitive?  But let me back up a bit, first,
and provide a bit of background.</p>

<h2>Background</h2>

<p>Rust bucks the &#8220;new language&#8221; trend by not having a purely
garbage-collected model.  We feature things like interior and unique
types which can be eagerly overwritten.  This means that we have to be
very careful when we create temporary references to those kinds of
values that these references remain valid.</p>

<p>Here is an example of unsafe code:</p>

<pre><code>fn main() {
    let mut v = [1, 2, 3];
    for vec::each(v) { |i|
        v = [];
    }
}
</code></pre>

<p>What is happening here is that we are iterating over the vector <code>v</code>
but, during the iteration, setting the local variable to be the empty
vector.  Because vectors are unique, this will cause the original
vector to be immediately freed&#8212;while the iteration is occurring!</p>

<p>Now, Rust today has an alias checker which is supposed to prevent
these sorts of errors.  However, it has some flaws: for one thing, it
admits that erroneous program I just showed you (that&#8217;s just a
bug). For another, the check is rather complex.  Sufficiently complex
that I don&#8217;t really understand the conditions that it is enforcing.
The core is a type-based alias analysis that tries to figure out what
kinds of data a function could possibly reach. But when it finds
potentially dangerous aliasing going on, the algorithm will sometimes
silently copy the data in question, if that seems harmless enough, or
other times issue warnings or report errors.</p>

<p>In defense of the current algorithm, however, the fact is that if you
want to remain flexible and not force programmers into too many
contortions, it&#8217;s hard to come up with a simple set of rules.  We&#8217;ll
see that as I go on.</p>

<h3>An alternative</h3>

<p>I have been working on a simpler alternative which is based more on
types and less on alias analysis.  The original idea was to base this
analysis purely on the declared mutability of local variables, fields,
and so forth.  We would then conservatively reject programs where
memory safety relied on a potentially mutable location not being
mutated.</p>

<p>I think, however, that I&#8217;ve decided this is so conservative as to be
unusable.  Consider this harmless program, for example:</p>

<pre><code>let mut v = [1, 2, 3];
let l = vec::len(v)
</code></pre>

<p>Under the rules I just gave you, this program would in fact be
illegal.  The reason is that vectors and unique and <code>vec::len()</code> is
created a transient reference to the vector it takes as argument.  The
safety analysis however sees that the local variable <code>v</code> is declared
as mutable, and thus consider this unsafe to be unsafe: what if <code>v</code>
were somehow changed by <code>vec::len()</code>?</p>

<p>Of course, we know that this is impossible.  <code>vec::len()</code> cannot just
gin up a pointer into the caller&#8217;s stack frame (well, not without an
<code>unsafe</code> block, anyway).  In this case, that&#8217;s pretty obvious: we
never even took the address of <code>v</code>.  The question is where you draw
the line.  How intelligent should the compiler get?  In general,
smarter seems better, but there are two countervailing forces: (1) if
the analysis is too complex, it&#8217;s hard to tell why it&#8217;s giving you an
error; (2) the more complex the analysis, the greater the chance of
bugs in the safety checker itself.  Let me tell you, it&#8217;s not fun to
spend a day tracking down a memory bug that the language supposedly
guarantees to be impossible.</p>

<p>I&#8217;ve been working on a compromise analysis which does not attempt to
do alias analysis but which <em>does</em> track which parts of the stack
frame are aliased or may be borrowed.  The analysis makes very
conservative assumptions about what a function can reach: it assumes
that if a non-unique pointer exists to a given memory location, the
function can access it.  Using this analysis, we can allow functions
to borrow data that is stored in mutable variables that are not
modified, so long as the address of those mutable variables was not
taken.</p>

<p>Under these rules, the <code>vec::len()</code> example is fine.  An example like this
is also fine:</p>

<pre><code>let mut v = [1, 2, 3];
for vec::each(v) { |i|
    io::print(#fmt("%d", i));
}
v = [];
</code></pre>

<p>Here, we iterate over the vector <code>v</code> and then, after the iteration,
clear <code>v</code> to the empty vector.  This mutation is fine because the
reference to the vector created in <code>vec::each()</code> only had a lifetime
equal to the for loop itself.</p>

<p>The example we saw before, where the vector is assigned not after the
loop but rather in the middle, would of course still fail to compile:</p>

<pre><code>let mut v = [1, 2, 3];
for vec::each(v) { |_|
    v = [];
}
</code></pre>

<p>Specifically, the analysis would report that the assignment to <code>v</code>
inside the loop conflicts with the borrow of <code>v</code> on the line before.
In effect, although the variable <code>v</code> is declared as mutable, it
becomes <em>temporarily immutable</em> during the loop.</p>

<h3>The algorithm at a high-level</h3>

<p>The key ideas that the algorithm tries to enforce is this:</p>

<ul>
<li>borrowing an <code>@T</code> pointer is safe regardless of where it is stored,
because we can just temporarily increase the ref count of the pointer
for the duration of the loan;</li>
<li>borrowing a <code>~T</code> pointer (or a unique vector) is safe if it is
stored in a location that can be considerd immutable for the
duration of the loan.</li>
</ul>


<p>We can consider a location be immutable under two conditions:</p>

<ul>
<li>the type system guarantees it to be immutable.  For example, the
contents of a box of type <code>@T</code> are immutable, as is the value of a
local variable that is not declared as mutable (with one exception,
see below);</li>
<li>the location is uniquely tied to the stack frame itself and the compiler
does not observe any assignments to that location, nor are there any
mutable aliases in scope.</li>
</ul>


<p>The first case is the simple set of rules I wanted to enforce earlier.
The second case is the more complex set I described later, where can
say that a local variable is &#8220;temporarily immutable&#8221;.  In fact, we can
go a bit further than just local variables, and also talk about the
contents of records stored in the stack, or the contents of unique
pointers found in the stack, or sequences of such things.  All of
those cases share the property that, unless the user takes their
address with the <code>&amp;</code> operator, they cannot be aliased outside the
function itself.</p>

<p>That means that we would accept any of the following equivalent programs:</p>

<pre><code>let mut v = [1, 2, 3];
for vec::each(v) { |i|
    io::print(#fmt("%d", i));
}
v = [];

let r = {mut v: [1, 2, 3]};
for vec::each(r.v) { |i|
    io::print(#fmt("%d", i));
}
r.v = [];

let u = ~mut [1, 2, 3];
for vec::each(*u) { |i|
    io::print(#fmt("%d", i));
}
*u = [];
</code></pre>

<p>However, we would reject this similar-looking program:</p>

<pre><code>let b = @mut [1, 2, 3];
for vec::each(*b) { |i|
    io::print(#fmt("%d", i));
}
</code></pre>

<p>The reason that this program is rejected is that we assume that
<code>vec::each()</code> has access to every <code>@T</code> value (to put it another way,
we assume that every <a href="http://en.wikipedia.org/wiki/Escape_analysis">aliasable value escapes</a>).  So that means we
cannot prove that <code>vec::each()</code> will not overwrite the contents of
<code>*b</code> (a more involved analysis might be able to see that <code>b</code> itself is
never leaked out of the stack frame).</p>

<p>If you want to have unique pointers within mutable boxes, you have to
bring them into your stack frame to work with them, generally making
use of the little known swap operator <code>&lt;-&gt;</code>.  For example, the prior
program might be written:</p>

<pre><code>let b = @mut [1, 2, 3];
let v = [];
*b &lt;-&gt; v; // bring [1, 2, 3] into our stack frame
for vec::each(v) { |i|
    io::print(#fmt("%d", i));
}
*b &lt;-&gt; v; // replace it
</code></pre>

<p>This limitation is basically the same as that taken by other unique
pointer systems.</p>

<p>This is one corner case I haven&#8217;t discussed yet.  Even immutable local
variables can be <em>moved</em> (sent, for example, to another thread).  So
we have to remember which immutable local variables are in use and
prevent moves from occurring.</p>

<h2>And now&#8230; the question I have been leading up to.</h2>

<p>Until now I haven&#8217;t talked at all about the <code>&amp;</code> operator.  One of the
nice features enabled by the work on references and pointer lifetimes
is to allow the user to take the address of a variable on the stack
(previously, this could only be done implicitly through reference-mode
arguments).  This feature <em>is</em> handy, and I&#8217;m a big fan of making
modifications to the local stack frame explicit, but it also
introduces complications.  Consider this variant of our usual example:</p>

<pre><code>let mut v = [1, 2, 3];
let w = &amp;mut v;
for vec::each(v) { |i| ... }
</code></pre>

<p>My check would actually reject this program.  The reason is that it
assumes <code>vec::each()</code> has access to all aliased data&#8212;including <code>w</code>.
Therefore, as in the case of <code>@mut T</code> types, we cannot prevent
<code>vec::each()</code> from overwriting <code>v</code> indirectly by modifying <code>*w</code>.  Here
you would get an error which points out that the existence of an
in-scope alias for <code>v</code> means that it cannot be borrowed by
<code>vec::each()</code>.  This seems reasonable to me.</p>

<p>But how smart is the compiler?  For example, is this program allowed?</p>

<pre><code>let mut v = [1, 2, 3];
let mut x = [4, 5, 6];
let mut w = &amp;mut x;
for vec::each(v) { |i|
    w = &amp;mut v;
}
</code></pre>

<p>Now, on the first iteration of the loop, <code>v</code> is unaliased, but it
becomes aliased during the loop.  Under our normal assumptions, then,
this program must be rejected, for fear of <code>vec::each()</code> assigning to
<code>*w</code> sometime after the first iteration.</p>

<p>Ok, what about this program:</p>

<pre><code>let mut v = [1, 2, 3];
for vec::each(v) { |i| ... }
let mut w = &amp;mut v;
</code></pre>

<p>Here I have moved the alias so it comes after <code>vec::each()</code>.  This
should presumably be ok.</p>

<p>But there is one wrinkle.  Sometimes it is hard to say when code will
execute, particularly around closures.  For example:</p>

<pre><code>let mut x = [4, 5, 6];
let mut v = [1, 2, 3];
let mut w = &amp;x;
debug::indent({||
    for vec::each(v) { |i| ... }
    w = &amp;mut v;
})
</code></pre>

<p>Here, the function <code>debug::indent</code> presumably does something like
cause all debug messages that occur during its argument to be
indented.  So it probably only runs the argument closure once.  But we
don&#8217;t know that.  So we&#8217;d have to reject this program, just in case
<code>debug::indent()</code> called its closure argument twice.</p>

<p>A similar problem crops up if we allow stack closures (as opposed to
the various kinds of copying closures) to be assigned to variables.
This is currently illegal but which I wouldn&#8217;t mind making it legal
someday.  But then a flow-sensitive analysis would have to understand
(and reject, in this case) code like this:</p>

<pre><code>let mut x = [4, 5, 6];
let mut v = [1, 2, 3];
let mut w = &amp;x;
let foo = fn&amp;() {
    for vec::each(v) { |i| ... }
    w = &amp;mut v;
};
foo();
foo();
</code></pre>

<p>Now on the second call to <code>foo()</code> it&#8217;s possible that <code>vec::each()</code>
might have access to <code>w</code>.</p>

<h2>So the options:</h2>

<p>So here are various options from dumb to smart:</p>

<ol>
<li><p>the compiler does a flow-insensitive analysis with respect to which
references exist.  <em>All</em> of the examples in the previous section
are illegal.  To make them safe, you have to explicitly introduce a block,
like:</p>

<pre><code> let mut v = [1, 2, 3];
 for vec::each(v) { |i|
     ...
 }

 {
     let w = &amp;mut v; // limit the scope of `&amp;`
 }
</code></pre></li>
<li><p>the flow-sensitive analysis rules I described in the previous
section are ok, when things are unclear just do your best;</p></li>
<li>this whole analysis is too dumb.  It should try to determine which
references actually escape the stack frame and track what they point
at and so forth.  Anything it can possibly figure out it should
figure out.</li>
</ol>


<p>I am torn at the moment.  I started with #2 but I am tempted to try #1
and just see how painful it really is.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[In favor of types of unknown size]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/27/in-favor-of-types-of-unknown-size/"/>
    <updated>2012-04-27T09:55:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/27/in-favor-of-types-of-unknown-size</id>
    <content type="html"><![CDATA[<p>I&#8217;m still thinking about vector and string types in Rust and I think
I&#8217;ve decided what I feel is the best approach.  I thought I&#8217;d
summarize it here and make the case for it.  If you don&#8217;t know what
I&#8217;m talking about, see <a href="http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices">this post</a> for more background.  I&#8217;ll
forward this to the mailing list as well; I&#8217;m sorry if it seems like
I&#8217;m harping on this issue.  I just think vectors and strings are kind
of central data structures so we want them to be as nice as possible,
both in terms of what you can do with them and in terms of the
notations we use to work with them.</p>

<h2>Summary</h2>

<p>First, The Grand ASCII Art Table, summarizing everything (sad fact:
<code>M-x picture-mode</code> is way more convenient than making an HTML table).
Blank spaces indicate things that are inexpressible in one proposal or
the other (for better or worse).</p>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>+---------------------++---------------------+
</span><span class='line'>| This proposal:      || Original proposal:  |
</span><span class='line'>|--------+------------||-------+-------------|
</span><span class='line'>| Type   | Literal    || Type  | Literal     |
</span><span class='line'>|--------+------------||-------+-------------|
</span><span class='line'>| [:]T   |            || [T]   | [1, 2, 3]   |
</span><span class='line'>| []T    | [1, 2, 3]  ||       |             |
</span><span class='line'>| &[]T   | &[1, 2, 3] ||       |             |
</span><span class='line'>| @[]T   | @[1, 2, 3] || [T]/@ | [1, 2, 3]/@ |
</span><span class='line'>| ~[]T   | ~[1, 2, 3] || [T]/~ | [1, 2, 3]/~ |
</span><span class='line'>| [3]T   | [|1, 2, 3] || [T]/3 | [1, 2, 3]/_ |
</span><span class='line'>|        |            ||       |             |
</span><span class='line'>| substr |            || str   | "abc"       |
</span><span class='line'>| str    | "abc"      ||       |             |
</span><span class='line'>| &str   | &"abc"     ||       |             |
</span><span class='line'>| @str   | @"abc"     || str/@ | "abc"/@     |
</span><span class='line'>| ~str   | ~"abc"     || str/~ | "abc"/~     |
</span><span class='line'>|        |            || str/3 | "abc"/_     |
</span><span class='line'>+---------------------++---------------------+</span></code></pre></td></tr></table></div></figure>


<p>The types <code>[]T</code> and <code>str</code> would represent vectors and strings,
respectively.  These types have the C representation <code>rust_vec&lt;T&gt;</code> and
<code>rust_vec&lt;char&gt;</code>.  They are of <em>dynamic size</em>, meaning that their size
depends on their length.  The literal form for vectors and strings are
<code>[a, b, c]</code> and <code>"foo"</code>, just as normal.</p>

<p>The types <code>[:]T</code> and <code>substr</code> represent slices of vectors and strings.
Their representation is the pair of a pointer and a length.  They are
each associated with a <a href="http://smallcultfollowing.com/babysteps/blog/2012/04/25/references">lifetime</a> that specifies how long the
slice is valid, and thus can be more fully notated as <code>[:]/&amp;r T</code> and
<code>substr/&amp;r</code>, but users will not have to write this very often, if
ever.</p>

<p>Vectors, strings, and fixed-length vectors are implicitly coercable to
slices just as today.  Furthermore, one can explicitly take a slice
using a Python like slice notation: <code>v[3:-5]</code> or <code>v[:]</code> to take a
slice of the entire vector.  It is also allowed to take a slice of a
slice.  This is where the <code>:</code> in the slice type comes from: it&#8217;s
supposed to echo this syntactic form.</p>

<p>Fixed-length vectors are written <code>[N]T</code>.  They are represented just
like a C vector <code>T[N]</code>.  The literal form is <code>[| v1, ..., vN]</code>. The
leading <code>|</code> serves to distinguish a fixed-length vector.  It is random
but whatever, this is a specialized use case for C compatibility.  The
length of the literal form is always derived from the number of items.
I opted not to include a way to represent fixed-length strings for the
<a href="http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices">same reasons I previously stated</a>.</p>

<h2>Advantages</h2>

<p>The big advantage is that everything is written the way that seems to
me to be most natural.  For example, a vector on the stack is
<code>&amp;[1, 2, 3]</code>.  A task-local vector is written: <code>@[1, 2, 3]</code>.  unique
vector is written <code>~[1, 2, 3]</code>.  Same with strings.</p>

<p>I also like the indication of where memory is allocated is orthogonal
to what is stored in the memory. The type and unary operators <code>&amp;</code>, <code>@</code>
and <code>~</code> tell you where the memory is allocated, and the types which
follow tell you what you will find at that memory.  If we have types
like <code>[1, 2, 3]/@</code>, they combine where the memory is allocated with
what you will find there (to be clear, that is by design, so as to
avoid the disadvantages in the next section).</p>

<p>There is no need for a literal form for slices.  If you create a
vector and then use it where a slice is expected, the type will be
coercable, so no error will result.</p>

<h2>Disadvantages</h2>

<p>The primary disadvantage is that the types <code>[]T</code> and <code>str</code> are of
dynamic length.  This implies a kind distinction that does not exist
today.  I&#8217;d be inlined to just make a rule that types of dynamic
length cannot be used as the types of local variables, fields, vector
contents, nor the values of generic type parameters (and maybe a few
other places).  Later we could add an explicit kind if that seems
necessary.  It basically means you would get an error message like
&#8220;the type <code>[T]</code> has unknown size cannot be used as the type of a local
variable, use a pointer like <code>@[T]</code> or <code>&amp;[T]</code>&#8221;.</p>

<p>Having types of unknown size are a complication, to be sure, but I
feel it is a lesser complication than having special types, expression
forms, and rules for vectors and strings.  Furthermore, this same case
(types of unknown size) has come up from time to time when thinking
about other possible future designs, so I am not sure that it can be
avoided.</p>

<p>A second, more subtle point is that slices are no longer the shortest
type in terms of how they are written, although they are probably the
most common thing you will want to use.  I am not too worried about
this either: <code>[:]T</code> is still fairly short and we will use it
ubiquitously.  One thing I don&#8217;t like is that I find <code>[:]</code> somewhat
hard to type.  Maybe that will get easier, or maybe something else
(e.g, <code>[.]</code> and a slice notation of <code>v[1..3]</code>)  would be better.</p>

<h2>Other kinds of variably sized types&#8230;?</h2>

<p>Records of dynamic size are common in C, and we may ultimately have to
be able to model that (though we could admittedly use the C trick,
where it pretends all types have fixed size when in fact the memory
allocated may be greater, combined with unsafe pointers). Still, there
is a legitimate use case for allocating a variably-sized vector
interior to a record even in Rust code, and we could support that
(it&#8217;s the same trick that we in fact use to implement vectors
themselves&#8212;if it&#8217;s important enough for us, maybe it&#8217;s important
enough for our users).</p>

<p>Another example would be base types.  We may sometime want to allow
records or classes that can be extended with subtypes.  In that case,
we could say that the base types have variable size, since the number
of fields they possess are unknown&#8212;this would mean that you only
refer to them by pointer, preventing the common C++ problems of
<a href="http://stackoverflow.com/questions/274626/what-is-the-slicing-problem-in-c">slicing</a> and unsafe array arithmetic.</p>

<p>I&#8217;m not sure where else this comes up.  Perhaps that&#8217;s it.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Permission regions for race-free parallelism]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/25/permission-regions-for-race-free-parallelism/"/>
    <updated>2012-04-25T17:50:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/25/permission-regions-for-race-free-parallelism</id>
    <content type="html"><![CDATA[<p>I&#8217;ve been making a point of reading academic papers on the train as I
ride home.  It&#8217;s so easy to get behind with the sheer quantity of work
that is being produced.  Anyway, it occurred to me that I ought to try
and summarize the papers I read on this blog so that I can I remember
my reactions to them.</p>

<p>I&#8217;ll start with &#8220;Permission Regions for Race-Free Parallelism&#8221;, by
Westbrook, Zhao, Budimilic, and Sarkar.  The basic idea builds off of
Habanero Java, which is a kind of fork of the X10 language that Sarkar
and his group work on.  The basic idea of the paper is to add a
language construct <code>permit</code> which looks like:</p>

<pre><code>permit read(x1,...,xn) write(y1,...,yn) {
    /* this code may read fields of x1...xn and write
       fields of y1...yn */
}
</code></pre>

<p>For example, imagine a method <code>pop()</code> that removes an item from the
front of a linked list:</p>

<pre><code>Node pop() {
    Node tmp = this.next;
    if (tmp != null)
        this.next = tmp.next;
    return tmp;
}
</code></pre>

<p>This would be annotated like so:</p>

<pre><code>Node pop() {
    permit write(this) {
        Node tmp = this.next;
        if (tmp != null)
            permit read(tmp)
                next = tmp.next;
        return tmp;
    }
}
</code></pre>

<p>A dynamic monitoring system will then guarantee check for races at the
granularity of the <code>permit</code> blocks.  An effect system also allows a
method to be called under the stipulation that reads/writes are
permitted of its parameters. Permission regions are not required for
final fields, naturally.</p>

<p>Finally, they support a view construct for arrays which allows you to declare
permission to access a portion of an array:</p>

<pre><code>region r = ...;
int[.] subA = A.subView(r);
permit write(subA) { ... }
</code></pre>

<p>This is reminiscent of my own <code>divide()</code> method in PJs.</p>

<p>Interestingly, they allow the local variables within a permit section
to be modified.  Presumably each such assignment will lead to a new
dynamic conflict check.</p>

<p>To reduce the annotation burden, the compiler will automatically
insert permission regions.  Basically they find the highest point in
the AST that includes all accesses within a given method, but they do
not cross <code>async</code> or <code>isolated</code> (the HJ keywords for spawning tasks
and for creating transactions).  They find this is usually right.</p>

<p>The whole point of this exercise is to reduce the overhead and (I
believe) improve the accuracy of dynamic checks.  Naturally, the
slowdown for monitoring for data races varied dramatically, but it was
generally around 1.5 to 2x.  There are some exceptions, such as
raytracer, which went as high as 22x.</p>

<h3>My reaction:</h3>

<p>Summary: interested but mildly skeptical.</p>

<p>Their performance numbers seem pretty decent for dynamic monitoring,
but I&#8217;m not sure it meets their goal of &#8220;always on&#8221;.  HJ&#8217;s target
audience after all is scientific computing, and 2x slowdown in that
field seems like a big deal to me.  Still, a lot of people are using R
and Python etc so maybe it is &#8220;fast enough&#8221;.  And of course they can
optimize further, I&#8217;m sure.</p>

<p>The actual semantics of their race check are sort of interesting.  A
narrow focus on data-races over other kinds of races can lead to
programs that do the wrong thing even though they never have any races.
The classic example is Java code like this:</p>

<pre><code>void addIfEmpty(/*shared*/ Vector v, Object o) {
    if (v.isEmpty()) v.add(o);
}
</code></pre>

<p>These two statements were presumably intended to be atomic, but of
course they may not execute atomically.  Nonetheless, since <code>Vector</code>
in Java is a fully synchronized class, there will be no data races
under the technical definition of data races.</p>

<p>In any case, declaring permission regions seems to suggest a
sensitivity to this issue, however the use of compiler inference kind
of works against this intutition, since the compiler may not know the
proper places to insert the checks.  (Here, for example, if fully
automated the compiler would still insert the permission regions
within the vector calls themselves)</p>

<p>But I think, in the end, races vs data races is besides the point.
That is, the permission regions are not intended as a kind of
&#8220;declaration of things that go together&#8221; but rather as a practical
means of reducing overhead and controlling the granularity of checks
for detecting data races.  Basically&#8212;I gather that they assume the
system will be always on and, generally, ignored by programmers.  But
if they come up against an issue where performance is a problem, they
will start using the effect system and explicit permission regions to
push these checks to a higher-level.</p>

<p>I&#8217;m not sure how effective this will be, however.  A lot of the
overhead seems to derive from the array view checks, which cannot be
automated.  Furthermore, when the number of items to be accessed is
unbounded, such as when walking a linked list, you cannot push the
permission regions out any bigger.  It would be nice to see numbers
that compare the overhead before they tweaked it and after to get an
idea of how much reduction is possible.</p>

<p>It brings to mind my own efforts with PJs etc: I am excited to see
that bear fruit, because I think the overhead of such dynamic checks
can be made <em>extremely</em> low.  Of course my system would not be nearly
as flexible as theirs, which is why the checks are so cheap.  But
basically I like the idea of dynamic checking for races, but I think
it is not necessary something you want to just layer on top of a
rather broken &#8220;everything shared and mutable all the time&#8221; system,
because the overheads are just too high.  Rather, you start with a
sane foundation, and you should be able to monitor for violations
relatively cheaply and locally.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[References]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/25/references/"/>
    <updated>2012-04-25T07:53:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/25/references</id>
    <content type="html"><![CDATA[<p>I want to do an introduction to the regions system I&#8217;ve been working
on.  This is work-in-progress, so some of the details are likely to
change.  Also, I&#8217;m going to try some new terminology on for size:
although it has a long history in the literature, I think the term
&#8220;region&#8221; is not particularly accurate, so I am going to use the term
&#8220;lifetime&#8221; or &#8220;pointer lifetime&#8221; and see how it fits.</p>

<p>In this post I&#8217;m just going to show some examples of how the new
features can be used.  In the next post, I&#8217;ll lift the curtain a bit
and explain how the checks work.</p>

<h3>Introduction</h3>

<p>Rust has always (at least, as long as I&#8217;ve been around) had three
sorts of pointers: <code>@T</code>, which is a task-local pointer into the heap;
<code>~T</code>, a unique pointer into the heap, generally (but not exclusively)
used for sending data between tasks; and reference mode arguments,
used to give a function a temporary pointer.</p>

<p>The goal of this work is to replace reference mode arguments with
something more flexible.  Reference mode arguments work quite well for
many purposes, but they have one primary limitation: they cannot be
stored into data structures.</p>

<p>So, in this branch, we (conceptually at least) remove reference mode
arguments from the Rust Pointer Pantheon and replace them with
reference types, written <code>&amp;T</code> (this is actually a shorthand, as we
will see later).  I will refer to a variable of reference type as a
reference.</p>

<p>References are basically generic pointers.  They can point anywhere:
into the stack, into the <code>@</code> heap, into the <code>~</code> heap, even into the
inside of a record or vector.  They can point at anything that a C
pointer could point at and can be used in many of the same ways;
however, they are free from the errors that C permits.  The type
checker guarantees that references are always valid, so you can&#8217;t have
a reference into freed memory, or into a stack frame that has been
popped, and so forth.</p>

<p>We&#8217;ll get to the full details of how the safety check works later
(probably in a separate post).  First, I want to give some examples of
using references.</p>

<h3>Using references</h3>

<h4>Simple references and borrowing</h4>

<p>Let&#8217;s create a record type <code>point</code> for use in our examples:</p>

<pre><code>type point = { x: uint, y: uint };
</code></pre>

<p>Now, imagine that we have a function which wants to compute the slope
of two points.  It doesn&#8217;t particularly care where those points are
allocated.  You could write it like so:</p>

<pre><code>fn slope(p1: &amp;point, p2: &amp;point) -&gt; float {
    let y = (p2.y - p1.y) as float;
    let x = (p2.x - p1.x) as float;
    ret y / x;
}
</code></pre>

<p>OK, that was fairly straightforward.  Now let&#8217;s look at how <code>slope()</code>
might be called.  First, assume that we have some routine which takes
a vector of pairs of points allocated on the heap and computes the
maximum slope of any of those pairs.  Why you would want such a
function, I don&#8217;t know, but this is how you would write it:</p>

<pre><code>fn max_slope(ps: [(@point, @point)]) -&gt; float {
    ps.max { |(p1, p2)| slope(p1, p2) }
}
</code></pre>

<p>You&#8217;ll notice that <code>slope()</code> is called with <code>p1</code> and <code>p2</code>, which have
type <code>@point</code>, not <code>&amp;point</code>.  The type checker happily accepts this,
however, because a reference can point anywhere, including into the
heap.  This process of converting of kind of pointer to another is
called <em>borrowing</em>.</p>

<p>The reason it&#8217;s called borrowing is that, in effect, the callee
(<code>slope()</code>) borrows a reference from the caller (<code>max_slope()</code>).  It
is the caller&#8217;s job to ensure that this reference remains valid for
the duration of the callee.  In this case, there is no extra work
required to make that true, but in some cases the compiler may be
required to increment a ref count or maintain a GC root (this is
highly dependent on how the <code>@</code> heap is managed, naturally).</p>

<p>You can also borrow <code>~</code> pointers.  This basically works the same as
with <code>@</code> pointers, except that the unique value cannot be moved away
(for example, sent to another task) while it is borrowed.  The reason
for that is that, for the duration of the borrowing, the unique
pointer is no longer unique.  So if you sent it to another task, for
example, then two tasks would have access to it.  Even within a single
task, if you gave the pointer away, then there would be multiple
copies each claiming to be unique, which would lead to double frees
and other badness.  The key invariant that <em>borrowing</em> maintains is
that, while a <code>~T</code> may be temporarily aliased, all of the aliases are
references, not other <code>~T</code> pointers.  So we can always identify the
true owner once the borrowing expires.</p>

<p>Right now, borrowing can only occur in method calls.  The borrowing
lasts for the duration of the method call.  In the future, borrowing
will also be possible in <code>alt</code> expressions and when assigning a local
variable with <code>let</code>.  In the former case, the borrowing will last for
the duration of the <code>alt</code> expression.  In the latter case, the borrow
will last until the local variable goes out of scope (until the end of
the enclosing block, in other words).</p>

<h4>Taking the address of local variables</h4>

<p>Sometimes we wish to give away pointers into our local stack.  For
example, there is a routine today called <code>vec::push(x, y)</code> which has
the effect of appending the value <code>y</code> onto the vector <code>x</code> (in place).
This can be implemented using references like so:</p>

<pre><code>fn push&lt;T:copy&gt;(v: &amp;mut [T], elt: T) {
    *v = *v + [T];
}
</code></pre>

<p>Here the argument <code>&amp;mut [T]</code> indicates a mutable reference: that is, a
reference which can be used to modify the data it points at.  The
requirement to explicitly declare which pointers may be used for
modification stems from Rust&#8217;s desire to make mutation explicit, and
is analogous to the existing <code>@mut T</code> and <code>~mut T</code> types.</p>

<p>To call push, we might write code like this:</p>

<pre><code>fn accum() {
    let mut v = [1, 2, 3];
    vec::push(&amp;v, 4);
    vec::push(&amp;v, 5);
}
</code></pre>

<p>Here we used the <code>&amp;</code> operator to take the address of a local variable
so that we could pass it into the <code>push()</code> routine.</p>

<p><em>An aside:</em> I believe that in the current implementation of the
compiler you would have to write <code>vec::push(&amp;mut v, 4)</code>&#8212;that is, you
would have to declare when taking the address of <code>v</code> that you intend
to mutate through this pointer.  I believe there is no reason we can&#8217;t
lift this restriction, however, and allow the compiler to figure it
out for itself. (I rather prefer the explicit form in theory, because
I like to make it clear when things are being modified, but I suspect
it will be annoying in practice)</p>

<h4>Copying into the stack</h4>

<p>Right now, if you wish to create a record literal on the stack, you
have to manipulate it by value.  So you might write code like:</p>

<pre><code>fn create_point() {
    let p1 = { x: 3u, y: 4u };
    let p2 = { x: 5u, y: 10u };
    let p3 = if cond {p1} else {p2};
    ...
}
</code></pre>

<p>Here the type of <code>p{1,2,3}</code> is <code>point</code>.  But often we wish to
manipulate values by pointer.  In this case, that would make <code>p3</code> a
cheaper copy, for example.  Using references, we can write something
like this:</p>

<pre><code>fn create_point() {
    let p1 = &amp;{ x: 3u, y: 4u };
    let p2 = &amp;{ x: 5u, y: 10u };
    let p3 = if cond {p1} else {p2};
    ...
}
</code></pre>

<p>Here we used the same <code>&amp;</code> operator, but with an rvalue (an expression
that is not assignable).  This simply allocates space on the stack and
copies the value into it.  The corresponding type of <code>p{1,2,3}</code> would then
be <code>&amp;point</code>, where <code>&amp;</code> is a reference into the stack of
<code>create_point()</code>.</p>

<h4>Placing references into structures</h4>

<p>Next let&#8217;s look at a case where we wish to store a reference into a
structure.  This example comes out of the Rust compiler, but it&#8217;s a
common pattern in practice.</p>

<p>In the Rust compiler, there is a phase of processing called encode in
which we generate the metadata for a compiled crate.  During this
encoding, we have a struct <code>encode_ctxt</code> that stores the various
context which is required.  Because this structure is only needed
during this one phase, it is allocated on the stack, and we pass it
from function to function using references (today, using a reference
mode argument).</p>

<p>The code to create this encode context looks something like the following:</p>

<pre><code>type encode_ctxt = { /* contents are not important */ };

fn begin_encoding(...) {
    let ecx = &amp;{ /* allocate an encode context */ };
    for items_to_encode.each { |item|
        encode_item(ecx, item);
    }
}
</code></pre>

<p>Here you see that <code>begin_encoding()</code> creates a variable <code>ecx</code>,
storing the data onto the stack.  This context is then passed to each
call to <code>encode_item()</code>.</p>

<p>What can happen then is that some subpart of the encoding requires
its own context.  For example, in our metadata encoding, we sometimes
have to serialize the AST for an inlinable function.  This requires quite
a bit more state, but it&#8217;s state that is specific to the inlining itself.
So we can define a type <code>inline_ctxt</code> that will include both the encoding
context <code>ecx</code> along with some other fields:</p>

<pre><code>type inline_ctxt/&amp; = {
    ecx: &amp;encode_ctxt,
    ...
};
</code></pre>

<p>What you see here is that the type <code>inline_ctxt</code> is declared like any
other record, but it has this <code>/&amp;</code> following the name.  This is a
declaration that the type will contain references.  The record
itself then simply embeds the <code>&amp;encode_ctxt</code> as any other field.
<em>Note:</em> It&#8217;s possible that the <code>/&amp;</code> might become inferred in the
future rather than being explicit.</p>

<p>Now I can write functions that create and use the inlined context as
follows:</p>

<pre><code> fn encode_inlined_item(ecx: &amp;encode_ctxt, ...) {
     let icx = &amp;{ecx: ecx, ...};
     ...
     some_helper_func(icx, ...);
     ...
 }

 fn some_helper_func(icx: &amp;inline_ctxt, ...) {
     // ... can use icx, icx.ecx, etc ...
 }
</code></pre>

<h4>References in boxes</h4>

<p>In the previous example, we create a structure on the stack which
contained a reference to some data living in an activation somewhere
up the stack.  It is also possible to place references into heap
objects.  For example, I could have allocated the <code>inline_ctxt</code> on
the heap like so:</p>

<pre><code> fn encode_inlined_item(ecx: &amp;encode_ctxt, ...) {
     let icx = @{ecx: ecx, ...};
     ...
     some_helper_func(icx, ...);
     ...
 }

 fn some_helper_func(icx: @inline_ctxt, ...) {
     // ... can use icx, icx.ecx, etc ...
 }
</code></pre>

<p>In this case, there is not really much reason to do this, as the lifetime
of the <code>inline_ctxt</code> is bound to the stack frame that created it.  But
it can be convenient in a number of scenarios:</p>

<ul>
<li>a long computation might make use of internal data that can be collected
before the computation itself completes, and this internal data may
need to contain references;</li>
<li>allocating values that you plan to return to your caller is most
conveniently done with an <code>@</code> pointer.</li>
</ul>


<p>This last point is interesting.  Basically, in most of our examples
we&#8217;ve been allocating things on the stack&#8212;but you can&#8217;t return stuff
that&#8217;s on your stack up to your caller, clearly (and if you try, in
Rust at least, you&#8217;ll find that a type error results).</p>

<h4>Arenas</h4>

<p>One very common C trick for speeding up allocation is to make use of
memory pools, also called arenas.  If you happen to have a lot of
allocations which you plan to do but which will all get freed at one
point, then you can allocate a big block of memory and just hand it
out piece by piece.  Once the pass is done, you free the memory all at
once.  The key is that you never track whether an individual
allocation has completed or not, so you avoid a lot of overhead. The
problem with arenas is that, as typically implemented, they are
unsafe, because you might free the arena but still hold on to pointers
that point into the arena.  This is where lifetimes come in.</p>

<p>Using a reference, we can allocate memory in arenas and be sure that
the reference will not outlive the arena itself.  For example, this
function will allocate a new point in an arena and return it:</p>

<pre><code>fn alloc_point(pool: &amp;arena) -&gt; &amp;point {
    ret new (pool) { x: 3u, y: 4u };
}
</code></pre>

<p>In this case, the type checker will assign the allocated point the
same lifetime as the arena itself.  So the point can be used so long
as the arena is valid.</p>

<h3>Lifetimes</h3>

<p>At this point, I&#8217;ve shown you a lot of examples of how references can
be used, but I have given basically no intution for how it is that the
compiler can prevent a reference from being used when it is no longer
valid.</p>

<p>The basic idea is that every reference type <code>&amp;T</code> is in fact shorthand
for a type written <code>&amp;a.T</code>, where <code>a</code> is some kind of <em>lifetime</em>.  The
lifetime of a reference defines when it is valid.  These lifetimes
correspond to the dynamic execution of some function, block,
expression, whatever.</p>

<p>To make this clearer, let&#8217;s look at an example.  Suppose I have this
simple function.  I have also shown the various lifetimes (named
<code>a</code>&#8230;<code>c</code>) graphically along the right-hand side.</p>

<pre><code>fn scoped_lifetimes(x: @uint) { // a
    let y = 3u;                 // |
                                // |
    if cond {                   // | b
        let z = 4u;             // | |
                                // | | c
        borrow(x) /* 1 */       // | | |
                                // | | -
    }                           // | -
}                               // -

fn borrow(x: &amp;uint) {...}
</code></pre>

<p>There are three distinct lifetimes in the function
<code>scoped_lifetimes()</code>, each nested within one another. The outermost
one is <code>a</code>, which corresponds to the entire function activation.  The
expression <code>&amp;y</code>, which takes the address of the local variable <code>y</code>,
would have type <code>&amp;a.uint</code>.</p>

<p>The next lifetime is <code>b</code>, which corresponds to the &#8220;then-block&#8221; of the
if statement.  The expression <code>&amp;z</code> would have the type <code>&amp;b.uint</code>,
because after the if statement concludes the variable <code>z</code> is no longer
in scope.</p>

<p>Finally, the lifetime <code>c</code> corresponds just to the call to <code>borrow(x)</code>.
Here, the variable <code>x</code> is coerced into a region pointer with lifetime
<code>&amp;c.uint</code>.</p>

<p>Now let&#8217;s examine <code>borrow()</code> a bit more closely.  The definition of borrow
is in fact shorthand for something like the following:</p>

<pre><code>                         //  d
                         //  .
                         //  .
fn borrow(x: &amp;d.uint) {  //  |
    ...                  //  |
}                        //  |
                         //  .
                         //  -
</code></pre>

<p>In other words, the <code>&amp;uint</code> type we saw before in fact expands to a
lifetime with a unique name; we&#8217;ll call this name <code>d</code> (in fact, all
uses of <code>&amp;</code> within the types of a function&#8217;s parameters or its return
type are references to a special region called the anonymous
region&#8212;it acts just like a named region, except that it doesn&#8217;t have
a name).</p>

<p>The lifetime <code>d</code> is a bit different from the other lifetimes we&#8217;ve
seen, as it appears within the function declaration itself: it is in
fact a lifetime parameter.  That is, it corresponds to some lifetime
which the caller will specify&#8212;the callee, <code>borrow()</code> in this case,
doesn&#8217;t know precisely how long the lifetime <code>d</code> lasts, it only knows
that <code>d</code> includes the entire execution of the callee. I&#8217;ve tried to
depict this in my ASCII art diagram using dots to represent the
unknown duration, with the pipes <code>|</code> representing what is known for
certain.  In the call to <code>borrow(x)</code> which we saw before, the lifetime
parameter <code>d</code> would be mapped to the lifetime <code>c</code> from
<code>scoped_lifetimes()</code>.</p>

<h4>Detecting errors</h4>

<p>The compiler uses these symbolic lifetimes to prevent problems.
Consider something simple like this:</p>

<pre><code>fn give_away() -&gt; &amp;uint {
    let y = 3u;
    ret &amp;y;
}
</code></pre>

<p>Here there is an error because the function is attempted to return a
pointer into its own stack frame.  To see how the compiler detects
this, consider the lifetimes involved:</p>

<pre><code>                            // a
                            // .
                            // .
fn give_away() -&gt; &amp;a.uint { // | b
    let y = 3u;             // | |
    ret &amp;y;                 // | |
}                           // | -
                            // .
                            // -
</code></pre>

<p>Here I have called the anonymous lifetime parameter <code>a</code>.  The
expression <code>&amp;y</code> has type <code>&amp;b.uint</code>, which does not match the expected
type <code>&amp;a.uint</code>, and so we get a type error.  This type error is
warning us that the lifetime of the pointer we are trying to return
(<code>b</code>) is shorter than the lifetime which was declared (<code>a</code>).</p>

<h3>Ta ta for now</h3>

<p>There&#8217;s more to tell, but I&#8217;ll stop here, as this post is already
plenty long.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Vectors, strings, and slices]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices/"/>
    <updated>2012-04-23T14:31:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/23/vectors-strings-and-slices</id>
    <content type="html"><![CDATA[<p>We&#8217;ve been discussing a lot about how to manage vectors and strings in
Rust.  Graydon sent out an excellent proposal which allows for a great
number of use cases to be elegant handled.  However, I find the syntax
somewhat misleading.  I&#8217;ve proposed one alternative on the mailing
list, but I now find I don&#8217;t like it, so I thought I&#8217;d brainstorm a
bit and try to find something better.</p>

<p>There are really three use cases:</p>

<ul>
<li>vectors and strings, which are either allocated on the task heap
(<code>@</code>) or exchange heap (<code>~</code>);</li>
<li>slices, which are a cheap, stack-bound way to represent subvecs and
substrings;</li>
<li>fixed-length vectors, which are mainly for C compatibility.</li>
</ul>


<p>In this post I&#8217;m going to focus on the first two cases. The last use
case (fixed-length vectors) is, I think, quite distinct from the first
two, and we should separate it out.  I have also omitted one use case
which Graydon&#8217;s proposal included: fixed-length strings.  I don&#8217;t
think that the type <code>str/10</code> is of much use, as it refers to a
by-value string that is <em>always</em> 10 characters long.  This is not like
a fixed-length buffer that can hold strings <em>up to</em> 10 characters
long.  Rather, as I understand it, it can <em>only</em> store strings of
exactly 10 characters.  How often are we likely to want that?</p>

<h3>Representation</h3>

<p>The representation of a vector or string is something like:</p>

<pre><code>struct rust_vec&lt;T&gt; {
    int fill;    // How many bytes are used
    int alloc;   // How many bytes are allocated
    T   data[0]; // Inline data
};
</code></pre>

<p>Note in particular that this structure does not have a fixed size.
Rather, it will vary depending on how many items are present.  This
indirection is efficient but causes us a bit of trouble.</p>

<p>The representation of a slice which Graydon proposed is something
like this:</p>

<pre><code>struct slice&lt;T&gt; {
    T *data;
    int length;
};
</code></pre>

<p>Basically, the pair of a pointer and a length of memory.</p>

<p>As an aside, Fixed-length vectors, have yet a third representation:
<code>T[N]</code>.  In other words, just a C-like vector.  So you can see that
slices, &#8220;vectors as a whole&#8221;, and fixed-length vectors are quite
different things to the compiler.</p>

<h3>Proposal the first</h3>

<p>One idea might be something like this:</p>

<pre><code>Proposed   Graydon   Representation
[T]        [T]       slice&lt;T&gt;
vec&lt;T&gt;               rust_vec&lt;T&gt;
@vec&lt;T&gt;    [T]/@     rust_box&lt;rust_vec&lt;T&gt;&gt;*
~vec&lt;T&gt;    [T]/~     rust_vec&lt;T&gt;*

substr     str       slice&lt;char&gt;
str                  rust_vec&lt;char&gt;
@str       str/@     rust_box&lt;rust_vec&lt;char&gt;&gt;*
~str       str/~     rust_vec&lt;char&gt;*
</code></pre>

<p>The literal forms would basically stay the same as they are today.  So
<code>[x1, x2]</code> has the type <code>vec&lt;T&gt;</code> and <code>"foo"</code> has the type <code>str</code>.</p>

<p>I have intentionally drawn a big distinction between the type of a
slice (<code>[T]</code>) and the type of a vector (<code>vec&lt;T&gt;</code>).  I think these
things are similar but different and people might be easily confused
if the notation is too similar.</p>

<p>There are two types here that cannot be expressed in the original
system: <code>vec&lt;T&gt;</code> and <code>str</code>.  There is a good reason that these types
are inexpressible: they do not have a fixed size.  So, allowing them
as types introduces a certain danger.  For example, a function like
the following could not be compiled:</p>

<pre><code> fn foo(x: @vec&lt;T&gt;) {
     let y = *x;
     ...
 }
</code></pre>

<p>After all, the size of the stack frame could not be correctly
calculated, it would depend on how much data was in <code>x</code>.  It is particularly
annoying to deal with this situation due to the possibility of writing
generic functions like</p>

<pre><code> fn gen_foo&lt;U&gt;(x: @U) {
     let y = *x;
     ...
 }
</code></pre>

<p>There are two solutions to this, which are really the same solution in
different guises.  The simplest solution is to say that type variables
cannot be bound to the types <code>vec&lt;T&gt;</code> and <code>str</code>.  This would prevent
us from calling <code>gen_foo()</code> with a vector or a string.  We&#8217;d also have
some kind of special treatment around assignments so that <code>let x =
[1, 2, 3]</code> ends up with a slice, I guess.  Have to think a bit about
that.</p>

<p>Alternatively, one could have a bound that indicates data of a known
size.  This kind would be required to manipulate instances of <code>T</code> by
value.  But this could rapidly become annoying.  You might prefer to
have the default be that types <em>do</em> have a known type and you have to
say when the type variable might <em>not</em>.  This is also ok although
generally type bounds <em>enable</em> operators, not <em>disable</em> them.</p>

<p>Both solutions are somewhat annoying and I know that Graydon was
trying to avoid them in his design.</p>

<h3>Proposal the second</h3>

<p>If we wanted to avoid the possibility of types whose size is not
known, then we have to take a different tack.  We can&#8217;t have the <code>@</code>
be a prefix anymore.  I&#8217;d still rather it come near the <em>front</em> of the
type, and not tacked on the end.  So far, my preferred notation for
<em>this</em> is something like the following:</p>

<pre><code>Proposed   Graydon   Representation
[]T        [T]       slice&lt;T&gt;
[@]T       [T]/@     rust_box&lt;rust_vec&lt;T&gt;&gt;*
[~]T       [T]/~     rust_vec&lt;T&gt;*

""         str       slice&lt;char&gt;
"@"        str/@     rust_box&lt;rust_vec&lt;char&gt;&gt;*
"~"        str/~     rust_vec&lt;char&gt;*
</code></pre>

<p>Yes, that&#8217;s right, I just proposed using <code>""</code> as the way to write the
type for strings.  Pretty wacky, I know.  But it seems like we need a
type name that has two parts, a begin and an end, so that we can stick
the <code>@</code> and <code>~</code> inside of them.  An alternative might be to use more
words (<code>str</code> vs <code>tstr</code> vs <code>ustr</code> or something).</p>

<p>The literal forms for vectors could be something like <code>[@|...]</code> and
<code>[~|...]</code>.  I don&#8217;t know about strings.</p>

<h2>What about fixed-length vectors?</h2>

<p>I don&#8217;t know.  We could do <code>[N]T</code>, but then it kind of looks like a
slice.  I personally lean towards something like <code>T * N</code>.  We also
need an expression form.  To be honest, I don&#8217;t care about this <em>that</em>
much, it seems like macros could solve it too.</p>

<h2>Summary&#8230;</h2>

<p>None of these ideas seem perfect.  I&#8217;m mostly tossing them out there
to ensure we keep talking about it.</p>

<p>p.s., I just realized as I read over the post that I forgot about
mutability.  Something like <code>[mut T]</code> cannot be written as <code>vec&lt;mut
T&gt;</code>, at least not today.  Sigh.  Well, I&#8217;ll post this blog post
anyway.  As I said, nothing here is perfect, just wanted to capture
some of the things I&#8217;ve been thinking about.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[On types and type schemes]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/23/on-types-and-type-schemes/"/>
    <updated>2012-04-23T08:54:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/23/on-types-and-type-schemes</id>
    <content type="html"><![CDATA[<p>After my recent dalliance in
<a href="http://smallcultfollowing.com/babysteps/blog/2012/04/15/syntax-matters-dot-dot-dot/">Matters of a Truly Trivial Nature</a>, I&#8217;d like to return to
Matters Most Deep and Profound.  I&#8217;m running up against an interesting
question with regions that has to do with the nature of function types
like <code>fn(&amp;int)</code>: up until now, I&#8217;ve assumed that this refers to a
function that takes an integer pointer in some region that is
specified by the caller.  That is, it is a kind of shorthand for a
type that might be written like <code>fn&lt;r&gt;(&amp;r.int)</code>, where the <code>&lt;r&gt;</code>
indicates that the function type is <em>parameterized</em> by the region <code>r</code>.</p>

<h3>But first, a digression on types and type schemes&#8230;</h3>

<p>This notation is analogous to a generic function, like:</p>

<pre><code>fn identity&lt;T&gt;(t: T) -&gt; T { ret t; }
</code></pre>

<p>However, there is an important distinction.  In Rust, as in ML,
parameterization only occurs on named items.  So, you can have a named
function <code>identity</code> defined generically, but you cannot have a type
<code>fn&lt;T&gt;(T) -&gt; T</code>.</p>

<p>This is an interesting and subtle point.  In fact, all Rust types are
<em>monotypes</em>, meaning types that refer to exactly one thing.  Now, it
may not be known precisely what that thing <em>is</em>, but there must be a
name for it.  So, the type of the <code>t</code> parameter is <code>T</code>, which is a
type variable.  This is a monotype, it can only refer to one thing:
the type <code>T</code>.  It just so happens, however, that we do not know when
typechecking <code>identity</code> what that type <code>T</code> is.</p>

<p>The type of the <code>identity</code> function itself, however, cannot be
represented as a monotype.  We cannot name a specific type for its
parameter and return value, it could safely be used with any type.  To
accommodate this concept, ML introduced the idea of a <em>type scheme</em>,
also called a polytype. (at least by <a href="http://en.wikipedia.org/wiki/Hindley%E2%80%93Milner">Wikipedia</a>, I&#8217;ve never
heard the term before.  but it seems logical.)</p>

<p>A type scheme is basically a type along with a set of <em>bound type
variables</em>.  A bound type variable is one that is defined within the
scheme itself.  So, if you have the type scheme <code>fn&lt;T&gt;(T) -&gt; T</code>, then
the variable <code>T</code> is said to be <em>bound</em> in this scheme.  In a scheme
like <code>fn&lt;T&gt;(T, U) -&gt; T</code>, the variable <code>T</code> is bound, but the variable
<code>U</code> is called <em>free</em>, as it is not defined in the scheme.  Note that
in a monotype all variables are free, as monotypes do not define any
type variables.</p>

<p>So, in a way, a like <code>fn&lt;r&gt;(&amp;r.T)</code> is really a monotype, although it
does not bind type variables.  But it can still refer to many concrete
types.  This entails complexity.</p>

<h3>&#8230;and now back to regions.</h3>

<p>So, the question is, should region variables be bound or free within a
function type?  It certainly makes life simpler if they are always
free, and it still results in a fairly expressive system.</p>

<p>But first let&#8217;s examine why bound regions make life complex.  To help
keep things clear, I will use the explicit &#8220;bound region&#8221; notation I
introduced earlier, even though it&#8217;s not an actual Rust type, and I
will eschew anonymous regions. This means that the notation for
writing function types and so forth will be a bit heavier than it
would be in &#8220;real life&#8221;.</p>

<p>I will use a few conventions: lowercase letters early in the alphabet
like <code>a</code>, <code>b</code>, and <code>c</code> refer to bound regions.  Lowercase letters late
in the alphabet (<code>r</code>, <code>s</code>) refer to free regions.  Plus, I generally
drop the types of a region pointer if they are not important, so let
<code>&amp;r</code> be shorthand for something like <code>&amp;r.int</code>.</p>

<h4>Subtyping of bound regions</h4>

<p><strong>Renaming of bound regions.</strong> Imagine a type <code>A=fn&lt;a&gt;(&amp;a)</code> and a type
<code>B=fn&lt;b&gt;(&amp;b)</code>.  Is <code>A</code> a subtype of <code>B</code>?  Clearly, the answer should
be yes: they are basically the same type, as you could rename the
bound variable <code>a</code> in <code>A</code> to <code>b</code>, and they would be precisely the
same.  So here we find that we have to consider possible renamings of
bound regions when considering subtyping.</p>

<p><strong>Instantiating bound regions.</strong> OK, well, now imagine the type
<code>C=fn(&amp;r)</code>.  Here, the region variable <code>r</code> is <em>not</em> bound but rather
free.  So in this case, is <code>A</code> a subtype of <code>C</code>?  I would argue that
the answer <em>should be</em> yes: after all, if you <em>instantiate</em> <code>A</code> with
the value <code>a</code> for <code>r</code>, you get <code>fn(&amp;r.T)</code>, the same as <code>C</code>.  So we
ought to consider possible instantiations of bound variables as well.</p>

<p><strong>Coallescing bound regions.</strong> Finally, one more example:</p>

<pre><code>fn&lt;a,b&gt;(&amp;a, &amp;b)     &lt;:    fn&lt;c&gt;(&amp;c, &amp;c)
</code></pre>

<p>Here the subtype is more flexible than the supertype.  The subtype
accepts two region pointers in any two regions, but the supertype
requires that they be in the same region.</p>

<h4>&#8230;with type variables, too</h4>

<p>Now, just to make things more fun, imagine we throw in a type variable
<code>X</code> into the mix.  Here we play the role of the inference engine,
which is trying to find a value for <code>X</code>.  So the question becomes,
is there any type that I can assign to <code>X</code> which would make the subtype
relation true?</p>

<p><strong>Referring to free regions.</strong> Let&#8217;s start with a simple example.
  Consider:</p>

<pre><code>fn(&amp;r)    &lt;:    fn(X)
</code></pre>

<p>In this case, <code>r</code> is free, so we can assign <code>X</code> the value of <code>&amp;r</code> and
everything should be fine.</p>

<p><strong>Bound regions.</strong>  Ok, what if the subtype refers to a bound region?</p>

<pre><code>fn&lt;a&gt;(&amp;a)    &lt;:    fn(X)
</code></pre>

<p>We can still handle this case, but it requires a <em>region variable</em> as well.
In other words, if we create a region variable <code>R</code>, then we can substitute
that region variable for <code>a</code> and obtain:</p>

<pre><code>fn(&amp;R) &lt;: fn(X)
</code></pre>

<p>Now we can assign <code>X</code> to <code>&amp;R</code> and then assume that the inference
engine will find a suitable region for <code>R</code>.  This will be based on
constraints from the rest of the program.</p>

<p><strong>Multiple parameters.</strong>  Of course, if there are multiple parameters,
there may be interactions between them:</p>

<pre><code>fn&lt;a&gt;(&amp;a, &amp;a)    &lt;:    fn(X, &amp;r)
</code></pre>

<p>In this case, because <code>r</code> appears free in the supertype, <code>X</code> can be
assigned <code>&amp;r</code>.  That would mean that <code>a</code> can be instantiated with
<code>r</code> and the subtyping relation holds.</p>

<p><strong>Bound regions within the supertype.</strong> What if the region in the
supertype is bound, not free?</p>

<pre><code>fn&lt;a&gt;(&amp;a, &amp;a)    &lt;:    fn&lt;c&gt;(X, &amp;r)
</code></pre>

<p>In this case, there is no value of <code>X</code> which is suitable.  This is
perhaps not obvious: you might think that <code>&amp;c</code> would be a fine value
for <code>X</code>.  But that means that the value of <code>X</code> would refer to the
region <code>c</code>, which is bound within the type.  It&#8217;s a scoping violation.
The name <code>c</code> has no meaning outside of the supertype, whereas the type
<code>X</code> (which appears free) does have meaning.  So <code>X</code> cannot refer to
regions bound within the supertype.</p>

<h4>Woah.</h4>

<p>Yeah, it&#8217;s complex.  I haven&#8217;t come up with an elegant implementation
for the inference engine that accommodates all of these scenarios.
One option is to not handle all of these cases.  I also ought to read
up in The Literature as well as the implementations of other languages
(e.g., Haskell, Scala) to see what they do in similar scenarios.
Still, I dislike the idea of having things in our type system that
require citations to explain.</p>

<h3>So, can we just drop bound type variables in function types?</h3>

<p>Actually, I think we definitely <em>could</em> (not yet sure if we <em>should</em>).
Most things I&#8217;ve thought of will &#8220;just work&#8221;, and when they won&#8217;t,
there is workaround via interfaces (more on that later).</p>

<p>First, an example of something that works:</p>

<pre><code>fn iter(v: [T], f: fn(&amp;T)) { 
    uint::range(0, v.len()) { |i|
        f(&amp;v[i]);
    }
}
</code></pre>

<p>This function iterates over each item in the slice <code>v</code> and invokes the
function <code>f</code> (I am assuming Graydon&#8217;s work on slices and vectors is
complete).  If we fully expand this type to see all the regions
involved, you end up with:</p>

<pre><code>fn iter(v: [T]/&amp;a, f: fn/&amp;a(&amp;a.T)) { ... }
</code></pre>

<p>This signature is probably a bit confusing.  As usual, I find the best
way to think about regions is as lifetimes (in fact, I am considering
changing my terminology over to use the word lifetime exclusively).
So what this notation means is that there is some span of time <code>a</code> in
which the vector data and the function closure is valid.  The function
itself expects a pointer which is also valid for this same span of
time (in this case, that pointer will be a pointer into the vector
contents, so its lifetime comes from there).  This span of time <code>a</code>
will generally be the call to <code>iter()</code> itself.</p>

<h3>What doesn&#8217;t work?</h3>

<p>Basically, what doesn&#8217;t work is when you want to have a function whose
arguments can have lifetimes whose lifetime is not yet known.  This
most commonly occurs when functions are stored into records.  One
example that comes to mind is the <code>hash</code> and <code>eq</code> functions that we
use to implement hashtables right now.</p>

<p>Currently, our hashtables are defined with a structure something like:</p>

<pre><code>type hash&lt;K,V&gt; = {
    hashfn: fn(&amp;K) -&gt; uint,
    eqfn: fn(&amp;K, &amp;K) -&gt; bool,
    ...
};
</code></pre>

<p>Here you see that the <code>hashfn</code> takes a pointer to the key <code>K</code> and
returns the hash (a <code>uint</code>).  The <code>eqfn</code> takes two keys and returns a
boolean if they are equal.</p>

<p>The key point here is that the lifetimes for the key arguments are not
known and cannot be known in advance.  The data for the hashtable is
stored in a structure on the heap and so its lifetime is not
stack-based and hence has no region; for any given hashtable
operation, the current array will be borrowed and thus tied to the
stack of that particular operation, but these future operations cannot
be given a name.</p>

<h4>Did you say something about a workaround?</h4>

<p>Yes, we can use ifaces to work around this problem.  Imagine an iface:</p>

<pre><code>iface hash_key_ops&lt;K&gt; {
    fn hash(k: &amp;K) -&gt; uint;
    fn eq(k1: &amp;K, k2: &amp;K) -&gt; bool;
}
</code></pre>

<p>I am mildly abusing ifaces here because the &#8220;self&#8221; would not be a
particular key but rather some singleton object representing the hash
function itself.  For example, I might define:</p>

<pre><code>enum murmur_hash { murmur_hash };
impl of hash_key_ops&lt;str&gt; for murmur_hash {
    fn hash(k: &amp;str) -&gt; uint { ... }
    fn eq(k1: &amp;str, k2: &amp;str) -&gt; bool { k1 == k2 }
}
</code></pre>

<p>Now we could define the hashtable like:</p>

<pre><code>type hash&lt;K,V&gt; = {
    ops: hash_key_ops/@&lt;K&gt;,
    ...
};

fn new_hash&lt;K,V&gt;(ops: hash_key_ops/@&lt;K&gt;) { ... }
</code></pre>

<p>Now whenever we want to hash a key we can invoke <code>tbl.ops.hash(key)</code>.
The key point is that the named functions in an iface, just like
function items, can have polytypes even though normal function types
are monotypes.  Then each time we invoke <code>hash()</code> we would instantiate
the bound regions with fresh region variables.</p>

<p>Of course, if we were going to use ifaces with hashtables, we might
rather define the iface over the key type itself.  That raises some
interesting issues about instance coherence which I plan to discuss in
a blog post Real Soon Now, but if you&#8217;re curious about <em>that</em> you may
also want to read my <a href="https://mail.mozilla.org/pipermail/rust-dev/2011-December/001036.html">mailing list post</a> on the topic as it is
still my preferred solution.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Syntax matters...?]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/15/syntax-matters-dot-dot-dot/"/>
    <updated>2012-04-15T19:55:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/15/syntax-matters-dot-dot-dot</id>
    <content type="html"><![CDATA[<p>For a long time, it was considered fairly obvious, I think, that
syntax didn&#8217;t really matter.  It was just the surface skin over the
underlying ideas.  In recent times, though, the prevailing wisdom has
reversed, and it is now quite common to hear people talk about how
<a href="https://www.google.com/search?q=syntax%20matters">&#8220;syntax matters&#8221;</a>.</p>

<p>While I don&#8217;t exactly disagree, I think that the importance of trivial
syntactic matters is generally overemphasized.  It is not a matter of
life and death whether or not semicolons are required to end a line,
for example, or whether parentheses are required in making a call.</p>

<p>Naturally, like all programmers, I have strong opinions on these
topics myself&#8212;or at least I used to.  But I&#8217;ve found over time that
one gets used to these matters quickly enough, for the most part.  But
I think there is a deeper sense in which syntax <em>can</em> matter.</p>

<p>Basically, there are some languages whose syntax is so distinctive
that it makes a qualitative difference to the experience of
programming.  In this case, having a different syntax can enable
something otherwise very challenging&#8212;or sometimes make simple things
extremely difficult.  Three examples come immediately to mind, I&#8217;m sure
there are more.</p>

<h3>Lisp</h3>

<p>Lisp (and its derivatives: scheme, clojure, etc) is a common example.
The Lisp family of languages is fairly unique in that the syntax of
programs is also the same syntax of the data structures in the
language (oddly, XSLT is the only other example I can think of; but
I&#8217;m sure there are more).  This is sometimes, and somewhat
grandiously, referred to as the <a href="http://en.wikipedia.org/wiki/Homoiconicity">&#8220;homoiconic&#8221;</a> property.</p>

<p>Homoiconicity makes it possible to have a very simple macro system
which can seamlessly integrate with the language.  This is simply
very, very challenging to do with a more traditional C-like syntax.
So, in this case, syntax really matters.</p>

<h3>Smalltalk</h3>

<p>Most languages pass the parameters to a method using a position
notation.  This may be written with parentheses (<code>foo(a, b, c)</code>) or
without (<code>foo a b c</code>) but the idea is basically the same.  Smalltalk
took a different approach.  In Smalltalk, each parameter is labeled,
and the name of the method as a whole is the concatenation of all of
these labels.  So you don&#8217;t write <code>foo.open("abc", true, false)</code> but
rather <code>foo open:"abc" read:true write:false</code>.  This may seem like a
small change, but it is not.  It has far-reaching consequences;
consequences which I think are not fully appreciated.  For example, it
is no accident that Smalltalk pioneered most of the powerful
refactorings we associate with Java and other statically typed
languages today&#8212;method names in Smalltalk are long and generally
unique, so you don&#8217;t need full type information for the compiler to
reliably trace them.</p>

<p>Another effect of this convention is to make certain classes of errors
impossible.  For example, one simply cannot provide the wrong number
of parameters to a method (the method name would not match).
Similarly, it is obvious what each parameter means and in what order
they should go.  With a call like <code>foo.open("abc", true, false)</code>, the
reader has no idea what <code>true</code> and <code>false</code> signify.  When the call
looks like this, <code>foo.open("abc", write, read)</code>, the reader <em>thinks</em>
they know what <code>write</code> and <code>read</code> signify, but without seeing the
source of the method, they can&#8217;t know for sure.  In fact, here, I
reversed the order.  This kind of error is surprisingly common, as an
old colleague of mine <a href="http://mp.binaervarianz.de/issta2011.pdf">described in a paper</a>.  But this error
is unthinkable in Smalltalk, as you would have to write <code>foo
openFile:"abc" read:write write:read</code>, making it quite clear that
something was amiss.</p>

<h3>Fortress</h3>

<p>I had the good fortune of talking to some of the guys on the Fortress
team a while back.  Fortress is home to some very interesting
ideas&#8212;parallel evaluation by default, for example!&#8212;and one of them
is that mathematical programs ought to be written in mathematical
notation, or something very close to it.  I can see that this is a
very appealing notion for mathematicians and physicists, as it will
help to lower the impedance barrier between the program and the theory
it models.  But it is also interesting for the developers of Fortress,
since mathematical notation is incredibly overloaded&#8212;meaning that
they are developing all manner of interesting new dynamic overloading
resolution techniques to make this whole thing work.  So in this case,
the syntax is a mixed bag: it makes some things easier (translating
math) and some things harder (defining the language semantics).</p>

<h3>The siren call of pretty syntax</h3>

<p>So yes, I do think syntax can matter.  But most of the time it
doesn&#8217;t.  This is not to say I am immune to the appeal of pretty
syntax (I&#8217;m as guilty as everyone else), nor that prettiness doesn&#8217;t
matter at all.  But mostly it&#8217;s a matter of <em>familiarity</em>.  Like
anything else, a new language will look a bit different, and you have
to get used to it. (Even Objective C looks pretty good to me now, for
crying out loud!)  Sometimes, though, things still seem hard to read
even after you&#8217;ve been hacking in the language for a while: these are
things that need changing.</p>

<p>It is important, however, to distinguish between <em>syntax</em> and
<em>expressiveness</em>.  I don&#8217;t care (too much) whether you write
<code>function(x) { ... }</code>, <code>\x -&gt; ...</code> or <code>{ |x| ... }</code> to denote a
closure, but there had better be a way to write a closure somehow!
(Java, I&#8217;m looking at you here)</p>

<p>It makes me a bit sad that there is so much focus these days on the
surface side of syntax&#8212;making indentation significant, omitting a
semicolon&#8212;but rather little on how a change in syntax can actually
change the experience of programming in a deeper way.</p>

<p><em>Aside #1:</em> It is too bad that the genuine advantages of
Lisp and Smalltalk syntax do not seem to have been sufficient to win
over the familiarity of a generally C-like look-and-feel.</p>

<p><em>Aside #2:</em> In case you can&#8217;t tell, I&#8217;m partially responding to the
fact that every time somebody posts a link about Rust, somebody else
makes some comment about the length of our keywords.  My personal
favorite is <a href="http://news.ycombinator.com/item?id=3826528">this one</a>, which seems to imply that we Rust
developers are involved in some kind of conspiracy&#8212;as if we <em>prefer</em>
endlessly defending our choice of <code>ret</code> over <code>return</code> rather than,
say, our choice of sendable unique pointers over shared memory.
Please.</p>

<p><em>Aside #3:</em> To be clear, I don&#8217;t think Rust&#8217;s syntax is anything
revolutionary.  It is basically a C derivative, like so many languages
these days.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[DOA: Don't overabstract]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/11/doa-dont-overabstract/"/>
    <updated>2012-04-11T13:21:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/11/doa-dont-overabstract</id>
    <content type="html"><![CDATA[<p>I&#8217;d like to propose a term for code that has been &#8220;over-DRY&#8217;d&#8221;
(dessicated?).  I occasionally run across some method which just seems
<em>horribly complex</em>.  Reading it closer, it usually turns out that what
happened is that two or three independent operations got collected
into one subroutine.  Perhaps they started out as doing almost the
same thing&#8212;but before long, they diverged, and now the subroutine
has grown a hundred parameters and has a control-flow path that
requires a whiteboard and a ultra-super-fine-point marker to follow.
But, just as often, you can tear this routine apart into two or three
routines that read just fine, even if they share a line or two of code
in common.  So I&#8217;m going to start calling such routines &#8220;DOA&#8221;, though
the acronym has a bit of a <a href="http://en.wikipedia.org/wiki/Dead_on_arrival">different expansion</a> when used as an
adjective.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Declared vs duckish typing]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/10/declared-vs-duckish-typing/"/>
    <updated>2012-04-10T11:19:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/10/declared-vs-duckish-typing</id>
    <content type="html"><![CDATA[<p>One of the questions in our object system is what precisely how
&#8220;declared&#8221; we want things to be when it comes to interfaces and
implementations.  In a discussion on IRC, graydon suggested it&#8217;d
be nice to have terms like &#8220;duck-typing&#8221; defined more precisely in
a Rust syntax, and he is correct.  So here is my effort.</p>

<h3>The current setup</h3>

<p>Currently, implementations must declare precisely what types they
implement.  For example, it looks like this:</p>

<pre><code>impl of draw for T {
    ...
}
</code></pre>

<p>where <code>draw</code> is an interface.  Then, later, if we have an instance of
type <code>S</code> and we wish to know whether it implementations the interface
<code>draw</code>, we can scan through the set of implementations that are
declared to implement <code>draw</code> and see if any of them are for the type
<code>S</code>.</p>

<h3>A more duck-typing like setup</h3>

<p>Another option would be to remove the requirement that an impl
declares what interfaces it implements.  In that case, when we have a
need to know if the type <code>S</code> implements the iface <code>draw</code>, we would
again scan all of the implementations in scope for the type <code>S</code>.  For
each one, we would check whether it contains all the methods defined
in <code>draw</code>.  If so, we declare to be an implementation of the iface (we
must also check that the methods contain the right types; it&#8217;s unclear
to me whether we should do this check before or after deciding that it
is an implementation, though).</p>

<h3>Why duck typing?</h3>

<p>It&#8217;s more convenient.  There is also, currently, no good way to create
an &#8220;after the fact&#8221; interface: support I have a bunch of types that
all already have a <code>draw()</code> method and a <code>bounds()</code> method defined,
and I&#8217;d like to make an iface like:</p>

<pre><code>iface draw_and_bounds {
    fn draw();
    fn bounds();
}
</code></pre>

<p>and then just use it.  Now everything just works.  In the more statically
declared world, I would then have to go over each type and do something
like:</p>

<pre><code>impl of draw_and_bounds for S { }
impl of draw_and_bounds for T { }
</code></pre>

<p>These impls just serve to declare that the type <code>S</code> (and <code>T</code>)
implements the iface <code>draw_and_bounds</code> (and needs no additional
methods to do so).  Actually, this wouldn&#8217;t work today at all, because
we don&#8217;t check for existing methods when deciding, so you&#8217;d really have
to do something like:</p>

<pre><code>impl of draw_and_bounds for S {
    fn draw() { self.some_other_impl::draw() }
    fn bounds() { self.some_other_impl::bounds() }
}
</code></pre>

<p>But of course the <code>some_other_impl::draw</code> syntax for naming a method
isn&#8217;t implemented, so you&#8217;d have to do something like:</p>

<pre><code>fn my_draw(self: S) { import some_other_impl; self.draw(); }
fn my_bounds(self: S) { import some_other_impl; self.draw(); }
impl of draw_and_bounds for S {
    fn draw() { my_draw(self) }
    fn bounds() { my_bounds(self) }
}
</code></pre>

<p>But we could fix that by implementing features.</p>

<h3>Why not?</h3>

<p>Just because methods with the right <em>names</em> are available doesn&#8217;t mean
that they will do what you expect.  Maybe you mean <code>draw()</code> as in
&#8220;draw your gun&#8221; not &#8220;draw yourself on the screen&#8221;.  It also prevents
&#8216;marker interfaces&#8217;, like Java&#8217;s <code>serializable</code>.</p>

<h3>Non-obvious implications and small design decisions</h3>

<h4>Simplicity and compilation time</h4>

<p>One of the arguments for a non-duck-typing scenario is that it makes
the system easier to implement.  We can generate the vtable at the
point of impl declaration and then refer to it from other places,
rather than having to generate the vtable lazilly as needed.</p>

<p>It seems to me that it would affect compilation time.  It&#8217;s bound to
be faster to check compliance with the iface once, at the <code>impl</code>, then
at each point of invocation.  However, we can cache these results, so
that&#8217;s probably not a big deal.</p>

<h4>Frankenstein impls</h4>

<p>A big open question (to me) is whether we should consider an interface
to be implemented if all the necessary methods are available but they
come from different sources.  For example, consider something like:</p>

<pre><code>impl draw for T { fn draw() { ... } }
impl bounds for T { fn bounds() { ... } }
</code></pre>

<p>Now, in a duck-typing world, is the <code>draw_and_bounds</code> iface
implemented or not?  It seems to involve a similar set of tradeoffs.
If the answer is that they are not implemented, we need to write
something explicit like <code>impl draw_and_bounds for T { ... }</code> just as
we had to do when not using duck typing at all.</p>

<p>Still, I think that we should disallow such &#8220;frankenstein&#8221; impls.  The
main reason is that it makes instance coherence just about impossible
to address (more on that in a later post, but in short form it
prevents us from concisely naming the origin of the iface methods).
It also makes the compiler more complex and heightens the danger of
matching methods with the same names but different semantics.</p>

<p>This is a short-ish post.  I&#8217;m sure there are many details I have
omitted.</p>

<h3>What do I want?</h3>

<p>I don&#8217;t know.  I originally wanted duck typing.  Now I am somewhat
undecided.  I do think Frankenstein impls (something else I originally
wanted) are bad (because of the instance coherence problems I alluded
to).  I think if we make the syntax for &#8220;reusing&#8221; existing methods to
implement a new iface sufficiently compact, it&#8217;s probably not so
painful.  I am not really worried about semantic mismatches: these are
rare in dynamically typed languages, and we have types and other
checks that make such a mismatch unlikely.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Rust's object system]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/09/rusts-object-system/"/>
    <updated>2012-04-09T10:05:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/09/rusts-object-system</id>
    <content type="html"><![CDATA[<p>On the <code>rust-dev</code> mailing list, someone pointed out another
<a href="http://www.bitc-lang.org/pipermail/bitc-dev/2012-April/003315.html">&#8220;BitC retrospective&#8221; post by Jonathon Shapiro concerning typeclasses</a>.
The Rust object system provides interesting solutions to some of the
problems he raises.  We also manage to combine traditional
class-oriented OOP with Haskell&#8217;s type classes in a way that feels
seamless to me. I thought I would describe the object system as I see
it in a post.  However, it turns out that this will take me far too
long to fit into a single blog post, so I&#8217;m going to do a series.
This first one just describes the basics.</p>

<p>One caveat: I <em>think</em> that these techniques are novel, at least in
some parts. However, I am not well-versed in the Haskell literature
and it&#8217;s possible that the techniques we aim to implement have been
explored already.  If so, I&#8217;d appreciate it if someone would point me
in the right direction!  There are some links in his post that I
haven&#8217;t read, for example, but I will definitely put them on my
reading list.</p>

<p><strong>EDIT</strong>: It&#8217;s a bit unclear what I precisely think is novel.  In
fact, when I wrote the previous paragraph, I was referring to our
proposed technique for enforcing instance coherence.  However, I
didn&#8217;t even describe this problem in this post, because I realized
there was a lot of background to cover.  So, to be clear, I don&#8217;t
think that the basics in this post are terribly novel&#8212;with the
exception of our use of the same interfaces to unify Haskell-style
type-classes (or C++ concepts, if you prefer) with OOP-style
existential (sub)typing.  That particular part works out quite well, I
think.</p>

<h3>The building block: ifaces</h3>

<p>The fundamental building block of Rust&#8217;s OOP system is the <code>iface</code>
(interface).  As in Java and other languages, an iface is just a set
of methods without implementations.  Let&#8217;s use the example of a
<code>hashable</code> value, which might be suitable for use as the key in a
hashtable:</p>

<pre><code>iface hashable {
    fn hash() -&gt; uint;
    fn eq(t: self) -&gt; bool;
}
</code></pre>

<p>This interface provides two methods.  The first, <code>hash()</code>, computes a
hash of the value and the second compares for equality.  You see that
an iface can use the special type <code>self</code>.  The type <code>self</code> means &#8220;the
same type as the receiver&#8221;.  A later post will demonstrate that this
type&#8212;while extremely useful!&#8212;introduces some complications.</p>

<h3>Classes</h3>

<p>Classes are like a pared down version of the classes you will find in
other languages. As in C++, they have fields, methods, constructors
and an optional destructor.  However, they do not inherit from one
another (we will see how to do polymorphism in a bit).  You can define
a class like so:</p>

<pre><code>class a_class {
    let x: int, y: uint;

    new(x: int, y: uint) {
        self.x = x;
        self.y = y;
    }

    fn get_x() -&gt; int { self.x }
}
</code></pre>

<p>The precise syntax will probably change (I am not fond of the
definition of constructors, in particular), but the basic idea will
remain the same: a class combines a set of fields with various
methods.  Members can be defined as private or public with the usual,
C++- or Java-like definition.  Fields can be immutable (the default)
or mutable (<code>let mut x: int</code>).</p>

<h3>Polymorphism using classes and ifaces</h3>

<p>There is no subtyping between classes.  However, sometimes you would
like to have a routine that operates on multiple types.  The canonical
example is to have an interface for &#8220;drawable&#8221; things like:</p>

<pre><code>iface draw {
    fn draw(gfx: graphics_context);
}
</code></pre>

<p>Along with various drawable shapes like:</p>

<pre><code>class square { fn draw(gfx: graphics_context) { ... } }
class circle { fn draw(gfx: graphics_context) { ... } }
...
</code></pre>

<p>Rust then offers you two ways to work with these drawable things.  The
first, interface types, is more like C++ or Java.  The second, bounded
type parameters, is more like Haskell&#8217;s type classes.  As we will see,
each technique is useful for different scenarios.</p>

<h4>Interface types</h4>

<p>As in Java, an interface like <code>draw</code> also has a corresponding type
(simply written as <code>draw</code>).  In fact, it has a family of types
(<code>draw@</code>, <code>draw~</code>, <code>draw&amp;</code>, and <code>draw</code>) just as with function
pointers, but for now there is no need to get into the full details.
The type <code>draw</code> will suffice.</p>

<p>The type <code>draw</code> means &#8220;some value which implements the drawable
interface&#8221;.  We can use the <code>draw</code> type to write a function which
takes a vector of drawable things and draws them all:</p>

<pre><code>fn draw_all(gfx: graphics_context, drawables: [draw]) {
    for drawables.each {|drawable|
        drawable.draw(gfx)
    }
}
</code></pre>

<p>This looks pretty close to Java or C++.  However, what happens at
runtime is somewhat different in some pretty important ways.  For one
thing, the <code>draw</code> type in Rust is represented as the pair of a pointer
to the instance data along with a <a href="http://en.wikipedia.org/wiki/Virtual_method_table">vtable</a>.  Invoking the <code>draw</code>
method, therefore, is simply a matter of extracting the function
pointer from the vtable and invoking it with the instance data as the
(implicit) first argument.</p>

<p>This representation is somewhat different from Java or C++, both of
which would have a single pointer to the object and would embed the
vtable in the object itself.  There are a variety of reasons that we
take a different approach which I will cover later.</p>

<p>The reason I am talking about how <code>draw</code> instances are represented at
runtime is that it is not the same as the way that a <code>@circle</code>
instance (for example) is represented.  The type <code>@circle</code> is just a
pointer to the a block of memory containing the fields for the class
circle.  There is only a single pointer and there is no vtable.  So we
cannot simply interpret the type <code>@circle</code> as a <code>draw</code> instance
without doing some conversion.</p>

<p>In Rust, this conversion is accomplished by casting the <code>@circle</code>
instance to the <code>draw</code> type.  So, an example of using the <code>draw_all</code>
method might look like:</p>

<pre><code>fn draw_a_square_and_a_circle(gfx: graphics_context) {
    let s = @square(...);
    let c = @circle(...);
    let objs = [s as draw, c as draw];
    draw_all(gfx, objs);
}
</code></pre>

<p>Here you can see that to construct the vector of drawables, we first
casted <code>s</code> and <code>c</code> to the type <code>draw</code>.  This cast constructs the pair
of the <code>s</code> and <code>c</code> pointers along with the appropriate vtable (in the
first case, one for <code>square</code>, in the second case, one for <code>circle</code>).</p>

<h5>Why is it designed this way?</h5>

<p>There are a variety of reasons that we took a different approach from
that used in Java or C++.  First, we wished to preserve the nice
quality of C++ that all virtual calls are implemented using simple
vtables: this is an efficient technique with reliable performance.  In
Java, in contrast, the precise implementation of interface calls can
vary.  Of course the JIT is able to generally produce efficient code
(typically using <a href="http://en.wikipedia.org/wiki/Inline_caching">PICs</a> or similar things) but we want to be able
to statically compile Rust without the need for just-in-time
techniques.</p>

<p>However, we also did not want to require that classes be pre-declared
as &#8220;implementing&#8221; a particular interface (or, in the case of C++,
extending the given abstract class).  In C++, the subtyping
relationship is used to guide the construction and layout of the
vtables (and, in some cases, multiple such vtables may be needed,
meaning that there is no unique pointer to the object data itself).
Without having that pre-declared relationship, we cannot pre-compute
the vtable(s) for an object in advance.</p>

<p>Therefore, we instead wait and lazilly construct the vtable at the
point of the cast (actually, there will be one vtable for each
class-iface pair that appears within a crate).  By representing the
<code>draw</code> instance as the pair of the instance data with the vtable, we
can easily have one class instances associated with any number of
vtables.</p>

<h4>Type classes</h4>

<p>There are two fundamental approaches to writing polymorphic functions
(in general, not just for interface types).  The Java and C++
technique, which we illustrated in the previous section, is to use
subtyping.  Another approach, pioneered in functional languages
(though it is also available in OOP languages) is to use parametric
(or &#8220;generic&#8221;) functions.  For example, we could write a function
<code>draw_many</code> like so:</p>

<pre><code>fn draw_many&lt;D:draw&gt;(gfx: graphics_context, drawables: [D]) {
    for drawables.each {|drawable|
        drawable.draw(gfx)
    }
}
</code></pre>

<p><code>draw_many()</code> looks very similar to <code>draw_all</code>.  It declares a type
parameters <code>D</code> and says that the type <code>D</code> must implement the <code>draw</code>
iface.  This <code>draw</code> interface is called the <em>bound</em> of the type
parameter <code>D</code>, because it bounds (or &#8220;puts a limit&#8221;) on what types can
be used for <code>D</code>: they must be types for which the interface <code>draw</code> is
available. It then takes a vector of <code>D</code> instances and iterates over
its contents, invoking the <code>draw()</code> method on each value.</p>

<p>There is in fact a subtle different between <code>draw_all()</code> and
<code>draw_many()</code>.  <code>draw_all()</code> took a vector of type <code>[draw]</code>: this
means that each entry in the vector may in fact correspond to a
distinct kind of drawable thing.  For example, the vector might have a
square and a circle, as we saw.  <code>draw_many()</code>, in contast, takes a
vector of type <code>[D]</code>.  This means that the type <code>D</code> could be a square
(which is drawable) or it could be a circle (which is also drawable),
but you cannot have a vector containing both a square <em>and</em> a circle.</p>

<p>To see more closely why this is, consider that at runtime we implement
generic functions like <code>draw_many()</code> by following the C++ approach:
that is, we duplicate the function for each type that it is used with.
Therefore, we can easily create a version of <code>draw_many()</code> for
squares by substituting <code>square</code> for each use of the type <code>D</code>:</p>

<pre><code>fn draw_many&lt;square&gt;(gfx: graphics_context, drawables: [square]) {
    for drawables.each {|drawable|
        drawable.draw(gfx)
    }
}
</code></pre>

<p>We can also create a similar one for circles, but there is no type
(other than <code>draw</code>) that we could use to create a version that accepts
a vector containing <em>both</em> circles and squares.  In fact, there can be
no such vector: all vectors must contain instances of a single type.</p>

<p>Using the type-class style of implementation is generally more
efficient than the traditional OOP-style, because it produces no
vtables at all (but it does produce more code, which has its own
inefficiencies).  This efficiency comes at the price of less
flexibility, because the style cannot deal with heterogeneous
collections.</p>

<p>Actually, this is not strictly true: it is (usually) allowed to
instantiate the type <code>D</code> with an iface type, so we could still invoke
<code>draw_many()</code> with a vector of draw instances, just as we did with
<code>draw_all()</code>.  This would be equally (in)efficient as the OOP version,
because all method calls would still go through a vtable.</p>

<h4>Code reuse via traits</h4>

<p>Inheritance is often used as a means of achieving code reuse in OOP
languages.  While it can be convenient, this is generally regarded as
unfortunate, because it ties together the sub<em>typing</em> relationship
with details about code reuse.  A more modern approach is to make use
of traits.  Rust offers traits but I won&#8217;t go into detail here.  In
effect, traits allow you to factor out common method implementations
in a much more flexible way than inheritance, without introducing the
complications of traditional multiple inheritance.</p>

<h3>Impls</h3>

<p>So far, the only way to define a value with a method is to define a
class and include the method in the class definition.  This is too
limiting, however, in two ways.  First, sometimes we want to define
methods outside the class body&#8212;for example, to extend a class
defined in one crate or module from somewhere else.  Second, not all
types in Rust are classes (for example, ints and vectors) and we don&#8217;t
want them to be, for efficiency reasons and C compatibility.</p>

<p>To address these two needs we allow you to define methods for a given
type using the keyword <code>impl</code>.  For example, suppose we want to add a
method <code>bounds()</code> that computes a bounding rectangle for a shape.  You
might do something like this:</p>

<pre><code>impl bounds for square {
    fn bounds() -&gt; rect;
}
</code></pre>

<p>Here the syntax <code>impl N for T</code> defines a suite of methods named <code>N</code>
for the type <code>T</code>.  You can also associate an <code>impl</code> with an <code>iface</code>
like so:</p>

<pre><code>iface bounds {
    fn bounds() -&gt; rect;
}

impl of bounds for square {
    fn bounds() -&gt; rect;
}
</code></pre>

<p>In this case, the name of the method suite is (by default) the name of
the iface.  The full syntax is <code>impl N of I for T</code>.</p>

<p>Using an <code>impl</code>, we can generalize interfaces to apply to arbitrary
types.  For example, we could implement the <code>draw</code> interface for a
<code>uint</code> (whatever that means):</p>

<pre><code>impl of draw for uint {
    fn draw(gfx: graphics_context) { ... }
}
</code></pre>

<p>Then a <code>[uint]</code> could be passed to <code>draw_many()</code>.  Similarly, we could
cast a <code>uint</code> to <code>draw</code>.</p>

<h4>Scoping of impls</h4>

<p>In order to make use of the methods in an <code>impl</code>, you must bring the
<code>impl</code> into scope using an import statement.  This is where the impl
name comes into play.  So, to use the <code>bounds</code> method from another
module, I must include something like:</p>

<pre><code>import B::bounds;
</code></pre>

<p>where <code>B</code> is the module containing the <code>impl</code> declarations.  The same
visibility rules apply when trying to cast a type to an iface or use
the type as the value for a bounded generic type parameter.</p>

<h3>Mismatches</h3>

<p>To some extent, the class and impl system were independently designed,
and there are a few mismatches (mostly in code that has not been fully
implemented).  The main one is that interfaces are duck-typed (not
declared) and impls declared when they implement an iface.  We will
align these to be the same (for the moment, probably initially by
adding the ability to declare an interface when you declare a class).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[For loops]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/04/06/for-loops/"/>
    <updated>2012-04-06T09:51:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/04/06/for-loops</id>
    <content type="html"><![CDATA[<p>First off, I want to welcome <a href="http://brson.github.com/">Brian Anderson</a> to the Rust blog-o-sphere
(which so far consists primarily of myself).  His <a href="http://brson.github.com/rust/2012/04/05/new-for-loops/">first post</a>
does a great job of explaining how to use the new <code>for</code> syntax that
was recently added to Rust: this syntax allows for <code>break</code>, <code>ret</code>, and
<code>cont</code> from within user-defined loops, which is very nice.</p>

<p>Reading some of the <a href="http://news.ycombinator.com/item?id=3806152">Hacker News comments</a>
(<a href="http://news.ycombinator.com/item?id=3807365">this one in particular</a>), I wanted to clarify one thing.  There
is some concern that this new syntax changes the semantics of <code>ret</code>
when, in fact, it aims to do precisely the opposite.</p>

<p>The goal is that <code>ret</code> always returns from the innermost enclosing
<code>fn()</code> declaration.  Sugared closures (e.g., <code>{|x| ...}</code>) do not count
as an fn-declaration, but a closure written out with <code>fn()</code> does.  If
you use <code>ret</code> from a context where the compiler cannot generate a
return from the innermost enclosing <code>fn()</code> declaration, a static error
results.</p>

<p>Here are some examples of what I mean.  First, the basic <code>for</code> loop:</p>

<pre><code>fn foo() {
    for each(v) {|e|
        ret e; // returns from foo()
    }
}
</code></pre>

<p>Here I am using an <code>fn@()</code> closure:</p>

<pre><code>fn foo() {
    let bar = fn@() -&gt; T {
        for each(v) {|e|
            ret ...; // returns from bar()
        }
        ret ...; // returns from bar()
    };

    ret ...; // returns from foo()
}
</code></pre>

<p>and here is an example where an error results:</p>

<pre><code>fn foo() {
    iter(v) {|e|
        ret e; // should return from foo(), but cannot
    }
}
</code></pre>

<p>Note that returning out of sugared closures like <code>{||...}</code> is only
allowed in the context of a <code>for</code> loop, since it requires additional
code generation to support.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Servo design]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/03/28/servo-design/"/>
    <updated>2012-03-28T08:30:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/03/28/servo-design</id>
    <content type="html"><![CDATA[<p>Yesterday we had a hackathon/meeting to discuss the overarching design
of Servo, the project to build a next-generation rendering engine.  We
didn&#8217;t ultimately do much hacking (though we did a little), but mostly
we tried to hammer out the big picture so that we can actually get to
writing code.  I wanted to try and write up what I understood as the
consensus (for the moment, anyway).</p>

<h3>The big picture</h3>

<p>There will be (at least) three large components.  Each is basically
operating in independent tasks and the various stages are therefore
largely isolated from one another and able to execute independently
(with certain exceptions, as we shall see):</p>

<ul>
<li>JS</li>
<li>Layout</li>
<li>Painting</li>
</ul>


<p>There are several data structures that will be maintained by these
different stages:</p>

<ul>
<li>The DOM</li>
<li>The &#8220;Layout Tree&#8221; (CSS boxes corresponding to each DOM element)</li>
<li>The &#8220;Display Tree&#8221; (what to draw at each location)</li>
<li>Various other structures:

<ul>
<li>backing store(s) for canvas etc.</li>
</ul>
</li>
</ul>


<p>I&#8217;ll go over each in turn.</p>

<h3>The DOM and Layout Tree</h3>

<p>The most interesting&#8212;and complex&#8212;part of the design centers around
the representation of the DOM.  We want the ability for layout to
execute in parallel with the JS itself.  However, both layout and JS
require access to the DOM; and, of course, the JS may choose to modify
the DOM at any time, and those changes should eventually be reflected
in the layout. Initially the plan was to overcome this by having two
DOMs: the main DOM, accessible to JS, and the shadow DOM, accessible
to layout. The shadow DOM would be kept up-to-date by messages from
the JS.  The problem with this plan is simply overhead: based on our
own experiments as well as feedback from <a href="http://www.cs.brown.edu/~blerner/">Ben Lerner</a>, we
decided this is not the best approach.</p>

<p>An alternative that we are considering instead is what we call the RCU
approach.  The name derives from the <a href="http://en.wikipedia.org/wiki/Read-copy-update">read-copy-update</a> pattern
used extensively in the Linux kernel.  The idea itself was also
inspired by the work on <a href="http://research.microsoft.com/apps/pubs/default.aspx?id=132619">Concurrent Revisions</a> by Burckhardt et
al. at MSR.</p>

<p>In a nutshell, the idea of the RCU plan is that when the JS node kicks
off a layout task, it will preserve the version of the DOM that the
layout is reading.  So any changes that occur while layout is active
must take place on a copy of the DOM.  Of course, it would be too
expensive to do a deep copy of the DOM when layout activates, and
<a href="http://en.wikipedia.org/wiki/Persistent_data_structure">traditional persistent data structures</a> like maps and vectors
are are not much help either.</p>

<p>One key ingredient for any RCU-like plan is that it must be possible to
know when readers are active.  It turns out that we should be able to
track this for layout and JS.  Basically, the JS task is the &#8220;driver&#8221;:
it decides when to start layout and may, in some cases, have to block
waiting for layout to terminate.</p>

<p>You can think of the main JS task as operating in a loop something
like this:</p>

<pre><code>layout_active = false;
dirty_nodes = NULL;
loop {
   execute_JS();

   if (dirty_nodes) {
       if (layout_active) {
           join_layout();
       }
       spawn_layout();
       layout_active = true;
   }
}
</code></pre>

<p>It can also happen that the JS requests the computed style information
or layout.  In this case, then JS must first join the layout task
(and, if the tree is dirty, it may have to spawn the task too!).</p>

<p>Our plan instead is to replace each pointer to a DOM node (<code>node*</code>)
with a handle (<code>rcu&lt;node&gt;*</code>).  This handle will be a structure like
the following:</p>

<pre><code>struct rcu&lt;T&gt; {
    T *wr_ptr;
    T *rd_ptr;
    rcu&lt;T&gt; *next_dirty;
};
</code></pre>

<p>The <code>wr_ptr</code> points at the current version of the node, whereas the
<code>rd_ptr</code> points at the version of the node that layout is operating
on.  At the moment when a layout task is spawned, <code>rd_ptr</code> and
<code>wr_ptr</code> are always the same.  Whenever JS wishes to make
modifications and layout is active, it follow an algorithm something
like this:</p>

<pre><code>void dirty(rcu&lt;T&gt; *handle) {
    if (handle-&gt;wr_ptr != handle-&gt;rd_ptr)
        return; // already dirty
    if (!layout_active)
        return; // doesn't matter

    handle-&gt;wr_ptr = new T(*handle-&gt;rd_ptr); // copy rd data
    handle-&gt;next_dirty = dirty_nodes;
    dirty_nodes = handle;

    return;
}
</code></pre>

<p>After this, it is safe for the code to make changes to the contents of
<code>handle-&gt;wr_ptr</code>.</p>

<p>The final step is to reset the <code>rd_ptr</code> to the <code>wr_ptr</code>.  This occurs
once layout is completed.  For example, we might implement the
<code>join_layout()</code> routine like so:</p>

<pre><code>void join_layout() {
    layout_task-&gt;join();

    // Reset read and write pointers:
    rcu&lt;T&gt; *p = dirty_nodes, *pn;
    while (p != NULL) {
        pn = p-&gt;next_dirty;
        p-&gt;rd_ptr = p-&gt;wr_ptr;
        p-&gt;next_dirty = NULL;
        p = pn;
    }
    dirty_nodes = NULL;
}
</code></pre>

<p>Note: the small details of this implementation will probably
change. For example, it might be better to store the <code>dirty_nodes</code> in
a vector instead of a linked list, or at least pull the <code>next</code> field
out somewhere else (this would for example make sense if the
proportion of dirty to clean nodes is small, as expected).  But you
get the idea (I hope, anyway).</p>

<p>So now that we&#8217;ve explained the basics, let&#8217;s look at a few variations.</p>

<h4>Separating layout into phases</h4>

<p>I described layout as one monolithic entity.  But in fact it can be
useful to separate it into multiple parts.  For example, some JS calls
require that style computation be completed, but do not require that
the actual layout boxes be computed nor that the geometry is complete.
Therefore, we can break the layout task into multiple tasks, allowing
the JS to join just the phase that it requires (as well as allowing it
to spawn a task which will only perform the style computation and so
forth.</p>

<h4>Triggering layout at other times</h4>

<p>For things like CSS animations, we would like to be able to trigger an
animation even while the JS is active.  We can do this without great
difficulty thanks to the periodic callback which the JS makes every N
operations or so.  Basically, when the animation is ready to begin the
next layout step, it will asynchronously set a flag
(<code>animation_requires_layout</code>).  During the JS callback, if layout is
inactive but <code>animation_requires_layout</code> is true, then it will spawn
off a layout task.  Any writes which occur after that point will have
to be RCU&#8217;d.</p>

<p>One issue with this which I can see: the layout task will see whatever
DOM modifications had occurred up until the point of the interrupt.
This doesn&#8217;t seem immediately desirable to me.  It could be
circumvented by tracking dirty nodes even when layout is inactive, and
just resetting the <code>rd_ptr</code>s every turn of the JS event loop.</p>

<h4>Other stuff</h4>

<p>We have to be careful around the backing buffers for Canvas layers and
other such data structures.  This doesn&#8217;t seem especially hard but
we&#8217;ll want to think about it.  Most likely Canvas will need to be
double-buffered and we&#8217;ll just swap the buffers at the same point we
adjust the <code>rd_ptr</code> (when you think about, the RCU scheme is basically
double-buffering for the DOM).</p>

<h4>Memory management</h4>

<p>Writing this up has brought some questions to mind.  Primarily my
concerns center around memory management.  Garbage collection
operating in the JS task while layout is active will have to be quite
careful.  It can safely trace through both the rd and wr ptrs for DOM
nodes, but if there are links from the DOM nodes to the computed style
and layout information (which we had thought to have), then it is not
safe for the GC in the JS task to look at those.  The layout may be
concurrently modifying them after all.  There is also the matter of managing
the memory for the layout data structures.</p>

<p>One solution is for GC to simply join the layout task before it begins
execution.  Or, similarly, to distinguish small collections&#8212;in which
we ignore layout data structures&#8212;from large collections, in which
case layout must be joined.  This is probably good enough.</p>

<h3>Painting</h3>

<p>When layout finishes, it can perform a paint by building up what is
now called a display tree&#8212;basically a list of rectangles to draw and
their contents&#8212;and send this off to the display task.  The display
task is then charged with walking this tree, rasterizing its contents,
and blitting the data to the screen.  This process can be done in a
very parallel way using rather simple techniques (blit any
non-overlapping rectangles in parallel, etc).  It can also use simple
caching to avoid expensive rasterizations, as Gecko does today. In
short, it seems fairly straightforward.</p>

<h3>The plan</h3>

<p>We hope to quickly build up a fairly rudimentary form of Servo based
on this architecture.  The layout algorithms themselves will probably
be initially implemented in a sequential fashion.  We can still get
quite a lot of simple parallelism from various pieces of low-hanging
fruit: selector matching, painting, etc.  And we get quite a bit of
pipeline parallelism and responsiveness by separating the various
tasks.  But eventually of course we hope to parallelize the layout pipeline
itself.</p>

<p>One important point which bz raised is that sometimes the raw
performance of layout is not terribly important&#8212;but it is important
that the browser stay responsive.  The fact that Gecko must do layout
on the main thread harms responsiveness, something which we should be
able to avoid.</p>

<h3>A final note</h3>

<p>One disappointing aspect of this plan is that the existing static
data-race verification techniques are so inadequate to the problem.
Actor-based solutions require total separation of the DOM trees,
leading to unacceptable overhead.  Simple parallelism like that
offered by painting will likely never be analyzable by any simple
static regimen: it would have to be able to reason about the fact that
painting two rectangles which do not overlap is a commutative
operation.  The RCU-like plan would of course be very hard to
statically analyze, though if you build something like that into your
language as a base abstraction&#8212;as with the
<a href="http://research.microsoft.com/apps/pubs/default.aspx?id=132619">Concurrent Revisions</a> work&#8212;that might work out well.</p>

<p>In general though I believe that a balanced approach to race detection
is best: statically verify where you can, accept the limited use of
dynamic schemes otherwise.  And we should be able to statically verify
simpler subproblems, for example, ensuring that layout only reads data
that is reachable via the <code>rd_ptr</code> and that JS only writes to data
reachable via the <code>wr_ptr</code> (this can be solved by a simple ADT which
only grants access via one pointer or the other).</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Avoiding region explosion in Rust]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/03/28/avoiding-region-explosion-in-rust/"/>
    <updated>2012-03-28T06:18:00-07:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/03/28/avoiding-region-explosion-in-rust</id>
    <content type="html"><![CDATA[<p>pcwalton and I (but mostly pcwalton) have been hard at work
implementing regions in Rust.  We are hoping to use regions to avoid a
lot of memory allocation overhead in the compiler&#8212;the idea is to use
memory pools (a.k.a. arenas) so that we can cheaply allocate the data
needed to process a given function and then release it all in one
shot.  It is well known that arenas are great fit for the memory
allocation patterns of a compiler, which tend to produce a lot of data
that lives for the duration of a pass but is not needed afterwards.</p>

<p>In any case, recently we had a discussion about how we can use
regions in the <code>trans</code> pass of the compiler: this is the pass which
converts from our internal representation (IR) to the LLVM&#8217;s IR.  I
thought it was worth sharing the result of this discussion.  The basic
summary is that we are able to make use of region subtyping to
accommodate a fairly complex pattern of lifetimes with very little
annotation overhead.</p>

<h3>The setting: contexts in trans</h3>

<p>First, let me introduce the problem: during translation, we produce a
lot &#8220;contexts&#8221;, which store needed data about the state of the
translation.  For our purposes, there are three contexts of note: the
<em>crate context</em>, or <code>ccx</code>, which stores crate-wide data such as
linkage information about top-level functions, constants, and so
forth; the <em>function context</em>, or <code>fcx</code>, which stores per-function
data such as references to the LLVM variables representing its
parameters and locals; and finally the <em>block context</em>, or <code>bcx</code>,
which stores information about a single basic block in the
<a href="http://en.wikipedia.org/wiki/Control_flow_graph">control-flow graph</a>.</p>

<p>What we would like to do is to create the crate context <code>ccx</code> on the
stack when we enter the translation phase for the crate as a whole.
Later, when we begin to translate a given function, we will allocate
its function context <code>fcx</code> on the stack as well.  The block contexts,
however, are a little different: they do not fully obey a stack
discipline.  That is, it is common for a function to create a new
block context and return it to its caller, perhaps with a signature
like the following:</p>

<pre><code>fn compile_if_then_else(bcx0: @block_ctxt,
                        cond: @expr,
                        then_blk: @code_block,
                        else_blk: @code_block) -&gt; @block_ctxt
</code></pre>

<p>This function would presumably generate the
<a href="http://en.wikipedia.org/wiki/File:If-then-else-control-flow-graph.svg">diamond-shaped if-then-else pattern</a>.  The initial block is the
block represented by <code>bcx0</code>.  The function will compile the condition
<code>cond</code> and generate branch to one of two new basic blocks representing
the true and false paths.  The code might look something like this
(note: this is not the actual code in rustc, which is naturally much
messier):</p>

<pre><code>let (bcx1, val) = compile_expr(bcx0, cond);
let mut bcx_true = new_bcx(bcx0.fcx);
let mut bcx_false = new_bcx(bcx0.fcx);
add_instr(bcx1, if(val, bcx_true, bcx_false));
</code></pre>

<p>The then and else blocks could then be compiled in the contexts of those
true and false blocks:</p>

<pre><code>bcx_true = compile_block(bcx_true, then_blk);
bcx_false = compile_block(bcx_false, else_blk);
</code></pre>

<p>And finally the two paths can be merged into a new block, which is the block
that gets returned:</p>

<pre><code>let bcx_join = new_bcx();
add_instr(bcx_true, goto(bcx_join));
add_instr(bcx_false, goto(bcx_join));
ret bcx_join;
</code></pre>

<h3>The problem: expressing context lifetimes with regions</h3>

<p>Let&#8217;s dig a bit more into the representation of these contexts.  The
details aren&#8217;t too important but I want to focus on the region-related
aspects that describe their lifetimes.  Remember that there is a crate
context <code>ccx</code> that is valid for the translation of the entire crate.
Its contents are not important, so let&#8217;s just assume it&#8217;s some record:</p>

<pre><code>type crate_ctxt = {
     ...
};
</code></pre>

<p>Then there is a function context.  It contains a pointer to the crate context,
along with some other stuff:</p>

<pre><code>type func_ctxt = {
    ccx: &amp;crate_ctxt,
    ...
};
</code></pre>

<p>Finally the block context, which contains a pointer to the function context:</p>

<pre><code>type block_ctxt = {
    fcx: &amp;func_ctxt,
    ...
};
</code></pre>

<p>Here I have shown the pointers as region pointers, but I haven&#8217;t
written any explicit region annotations.  The question is, what
regions should we associate with those pointers?</p>

<h3>The maximally expressive approach</h3>

<p>If you wanted to take the maximally expressive approach, you would
wind up with a lot of region parameters.  For now I will show this in
a very explicit syntax in which types are given explicit Region
parameters, but this syntax is not valid Rust and (hopefully) never
will be:</p>

<pre><code>type crate_ctxt = {
     ...
};

type func_ctxt&lt;&amp;c&gt; = {
    ccx: &amp;c.crate_ctxt,
    ...
};

type block_ctxt&lt;&amp;f,&amp;c&gt; = {
    fcx: &amp;f.func_ctxt&lt;&amp;c&gt;,
    ...
};
</code></pre>

<p>You can see the problem.  The type for the block context must be
annotated with two region parameters, one to describe the region of
the function context and one for the crate context.</p>

<p>In this technique, if we have a variable <code>bcx</code> of type
<code>&amp;b.bcx&lt;&amp;f,&amp;c&gt;</code>, then <code>bcx.fcx.ccx</code> will have type <code>&amp;c.ccx</code>: the
precisely correct region, presumably.</p>

<p>For reference, the signature of <code>compile_if_then_else()</code> would become:</p>

<pre><code>fn compile_if_then_else(bcx0: &amp;b.block_ctxt&lt;&amp;f,&amp;c&gt;,
                        cond: @expr,
                        then_blk: @code_block,
                        else_blk: @code_block) -&gt; &amp;b.block_ctxt&lt;&amp;f,&amp;c&gt;
</code></pre>

<h3>The minimally expressive approach</h3>

<p>The approach we plan to take is much simpler.  Types do not have
region parameters.  Instead, when we instantiate an <code>&amp;T</code> type to a
specific region, the outermost <code>&amp;</code> in a function prototype is assigned
a fresh region, but <code>&amp;</code> which appear within that type are assigned to
this same fresh region.  This means that if we have a variable <code>bcx</code>
with type <code>&amp;b.bcx</code>, then <code>bcx.fcx.ccx</code> will have type <code>&amp;b.ccx</code>: this
is an underapproximation of the lifetime of the crate context.  The
true lifetime is <code>&amp;c</code> which is some superset of <code>&amp;b</code>.  The reason that
this whole scheme type checks is because of the subtyping
relationships between region pointers: a reference with a longer
lifetime (like <code>&amp;c.ccx</code>) can be used wherever a reference with a
shorter lifetime (like <code>&amp;b.ccx</code>) is expected.</p>

<p>Under this approach, the signature of <code>compile_if_then_else()</code> becomes:</p>

<pre><code>fn compile_if_then_else(bcx0: &amp;b.block_ctxt,
                        cond: @expr,
                        then_blk: @code_block,
                        else_blk: @code_block) -&gt; &amp;b.block_ctxt
</code></pre>

<p>Not so bad.</p>

<h3>Arenas and placement new</h3>

<p>One question remains: because the lifetime of block contexts is not
bound by the call stack, how can we manage their allocation without
resorting to heap allocation (the function context and crate context
can be allocated on the stack)? The answer is that we will use arenas.</p>

<p>An arena is basically a pool of memory in which we can allocate lots
of data and then release the pool all in one shot.  This is very cheap
but only suitable for places where allocation follows a &#8220;phase-based&#8221;
pattern.</p>

<p>We will use a memory pool which is allocated and released per-function.
Therefore, the pool itself will be stored in the function context:</p>

<pre><code>type func_ctxt = {
    ccx: &amp;crate_ctxt,
    pool: &amp;memory_pool,
    ...
};
</code></pre>

<p>In the current Rust type system, anyhow, a memory pool can be any type
for which there exists an <code>impl</code> offering an <code>alloc(sz: uint, align:
uint) -&gt; *()</code> method, which allocates <code>sz</code> bytes of memory at the
given alignment and returns a pointer.  An expression like <code>new (pool)
value</code> will cause <code>pool.alloc()</code> to be invoked and will then store the
value into the memory location that was returned.  The result is a
region pointer in the same region as the pool itself.</p>

<p>This means that allocating a new block context looks something like:</p>

<pre><code>fn new_bcx(fcx: &amp;f.func_ctxt) -&gt; &amp;f.func_ctxt {
    new (fcx.pool) {fcx: fcx, ...}        
}
</code></pre>

<h3>The Summary</h3>

<p>The basic idea of the approach is to retain less information.  For a
given region pointer <code>p</code>, all you know is that any data reachable via
some path like <code>p.f.g.h</code> will be live as long as <code>p</code> is live.  It
<em>seems</em> that this is enough in practice for most real use cases. Time
will tell, I suppose.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Progress on inlining]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/03/03/progress-on-inlining/"/>
    <updated>2012-03-03T06:44:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/03/03/progress-on-inlining</id>
    <content type="html"><![CDATA[<p>Cross-crate inlining has come a long way and is now basically
functional (I have yet to write a comprehensive test suite, so I&#8217;m
sure it will fail when exercising various corners of the language).</p>

<p>Just for fun, I did some preliminary micro-benchmarks.  The results
are not that surprising: removing method call overhead makes programs
run faster! But it&#8217;s still nice to see things go faster.  We&#8217;ll look
at the benchmarks, see the results, and then dive into the generated
assembly.  In all cases, I found LLVM doing optimizations that rather
surprised me.</p>

<h3>How to use it</h3>

<p>Actually, <code>rustc</code> has been doing inlining without any special
annotations for a long time&#8212;but only within one crate.  If you want
to enable a function to be inlined when called from another crate, you
simply have to add an <code>#[inline]</code> annotation to it, like so:</p>

<pre><code>#[inline]
fn range(lo: uint, hi: uint, it: fn(uint)) {
    let i = lo;
    while i &lt; hi { it(i); i += 1u; }
}
</code></pre>

<p>This is the <code>uint::range()</code> function, which simply invokes its
argument on every integer in a particular range.</p>

<p>The reason that an annotation is required to inline calls to functions
in other crates is that cross-crate inlining complicates the
recompilation model.  Normally, crates are dynamically linked, so if
you change the implementation of a function but not its type
signature, then there is no need to recompile dependent crates or
programs.  However, if an <em>inlined</em> function is changed, then every
caller <em>must</em> be recompiled in order to observe that change, as the
source of that function will have been inlined into their local
compilation units (of course, if the inlined function is not exported
or not used, then there is again no need to recompile dependent
crates).</p>

<p>The <code>#[inline]</code> directive currently takes one option: you can write
<code>#[inline(always)]</code>.  The difference is that the former is a hint,
which the compiler may choose to ignore.  The <code>always</code> directive makes
the hint stronger, causing the compiler to ignore the typical
heuristics and thresholds that it uses to decide when to inline.
Currently, these hints are passed on directly to LLVM; unfortunately,
I have found that if you do not write <code>#[inline(always)]</code>, LLVM almost
always chooses not to inline, so probably we have to adjust the
heuristics somewhat for Rust code.</p>

<h3>Benchmark #1: <code>uint::range</code></h3>

<p><code>uint::range</code> is Rust&#8217;s way of iterating over a range of integers.
The following simple program simply sums up the integers from <code>0</code>
to <code>N</code>, where <code>N</code> is provided on the command line:</p>

<pre><code>fn main(args: [str]) {
    let r = option::get(uint::from_str(args[1]));
    let sum = 0u;
    uint::range(0u, r) {|i|
        sum += i;
    }
    io::print(#fmt["Sum from 0 to %u is %u\n", r, sum]);
}
</code></pre>

<p>Before inlining, this program would literally create a stack closure
for the body of the loop and pass it to the library function range
(the source of which was shown above).  Range would then iterate and
invoke the closure on every iteration.</p>

<p>We&#8217;ll look at the generated assembly shortly.  But first, let&#8217;s see
some simple performance measurements:</p>

<pre><code>; rustc -O --inline --monomorphize ~/tmp/iterator.rs
; time ~/tmp/iterator 10000000000
Sum from 0 to 10000000000 is 13106511847580896768

real    0m0.016s
user    0m0.010s
sys 0m0.006s
; rustc -O ~/tmp/iterator.rs -o ~/tmp/iterator-no-inline
; time ~/tmp/iterator-no-inline 10000000000
Sum from 0 to 10000000000 is 13106511847580896768

real    0m48.217s
user    0m48.203s
sys 0m0.014s
</code></pre>

<p>As you can see, the inlining optimizations are still not enabled by
default (at least on my machine, compilation does succeed with
inlining enabled (or it did when I last tested it), but I am still not
happy with the auto-generation of the serialization code and so I did
not want to have the main build of the compiler depend on it yet).
However, there is a big difference between the inlined and non-inlined
version of this benchmark!  The non-inlined form took about 3013 times
as long!  We&#8217;ll see why this is when we dig into the generated
assembly.  The reasons surprised me a bit.</p>

<h4>Generated assembly</h4>

<p>A (somewhat simplified and annotated) extract of the generated
assembly for the <code>uint::range()</code> example is below.  Actually, LLVM is
amusingly both <em>extremely</em> smart and kind of dumb here.  The actual
computation of the sum has been removed and turned into an algebraic
formula.  After that formula is computed, then there is a useless
little while loop that just iterates from 0 to n doing nothing:</p>

<pre><code>  ...
Ltmp3:
  ; initialize sum to 0u
  ; and branch out if `r` is 0
  movq    $0, -56(%rbp)
  movq    -48(%rbp), %rcx
  testq   %rcx, %rcx
  je      LBB0_9

  ; compute (r*(r-1)) / 2
  ; (closed form of summation)
  ; and store into %rdx
  leaq    -1(%rcx), %rax
  leaq    -2(%rcx), %rdx
  mulq    %rdx
  shldq   $63, %rax, %rdx
  addq    %rcx, %rdx

  ; loop r times doing nothing
LBB0_7:
  decq    %rcx
  jne     LBB0_7

  ; store final result of summation
  ; and move on
  decq    %rdx
  movq    %rdx, -56(%rbp)

LBB0_9:
  ...
</code></pre>

<h3>Benchmark #2: <code>vec::iter</code></h3>

<p>Well, that benchmark was fun but since LLVM got so smart it&#8217;s not as
interesting as I&#8217;d like.  So I wrote up another one that uses
<code>vec::iter()</code>.  This will also have the added benefit of showing off
Marijn&#8217;s work on monomorphization, which optimizes our treatment of
generic functions.  The example is basically the same as the previous
one, but it uses vectors:</p>

<pre><code>fn main(args: [str]) {
    let r = option::get(uint::from_str(args[1]));
    let v = vec::enum_uints(0u, r);

    let start = std::time::precise_time_s();

    let sum = 0u;
    vec::iter(v) {|i|
        sum += i;
    }

    let end = std::time::precise_time_s();
    io::print(#fmt["Sum from 0 to %u is %u\n", r, sum]);
    io::print(#fmt["time: %3.3f s\n", end - start]);
}
</code></pre>

<p>Unfortunately, the time to execute is largely dominated by building up
the vector of integers we&#8217;re going to iterate over, so I added some
measurements of the time spent iterating to get a better idea of the
effects of inlining.</p>

<p>Before we dig into the generated assembly, let&#8217;s look at the measurements:</p>

<pre><code>;rustc -O --inline --monomorphize ~/tmp/iterator_vec.rs
;~/tmp/iterator_vec 100000000
Sum from 0 to 100000000 is 5000000050000000
time: 0.140 s
;rustc -O ~/tmp/iterator_vec.rs -o ~/tmp/iterator_vec-no-inline
;~/tmp/iterator_vec-no-inline 100000000
Sum from 0 to 100000000 is 5000000050000000
time: 1.183 s
</code></pre>

<p>Woohoo, the non-inlined version took 8 times longer.  That&#8217;s
satisfying.  More satisfying, in a way, than the 3000x improvement
from before, since it suggests we&#8217;re doing things better but not
just winning by a kind of trick.</p>

<p>(Sharp-eyed readers may have noticed that the results of the summation
are different than before.  This is because <code>vec::enum_uints()</code>
generates a vector of <code>i</code> such that <code>0 &lt;= i &lt;= N</code> whereas
<code>uint::range()</code> explores the range <code>0 &lt;= i &lt; N</code>.  Yay for
consistency.)</p>

<h4>Defining <code>vec::iter</code></h4>

<p>Before we look at the assembly, let&#8217;s see how <code>vec::iter()</code> is defined:</p>

<pre><code>#[inline(always)]
fn iter&lt;T&gt;(v: [const T], f: fn(T)) {
    unsafe {
        let mut n = vec::len(v);
        let mut p = unsafe::to_ptr(v);
        while n &gt; 0u {
            f(*p);
            p = ptr::offset(p, 1u);
            n -= 1u;
        }
    }
}
</code></pre>

<p>This implementation makes use of pointer arithmetic
contained within an unsafe block.  It&#8217;s basically
equivalent to the following C++-ish code:</p>

<pre><code>template&lt;class T&gt;
void iter(vec&lt;T&gt; vec, void (*f)(T&amp;)) {
    n = len(vec);
    T *p = data(vec);
    while (n &gt; 0) {
       f(*p);
       p += 1;
       n -= 1;
    }
}
</code></pre>

<h4>Generated assembly</h4>

<p>OK, now let&#8217;s look at the assembly.  We&#8217;ll see that we&#8217;re generating
pretty decent code.  One thing that could perhaps be improved is that
the call to <code>unsafe::to_ptr()</code> does not appear to have been inlined
despite the fact that its definition is marked as <code>#[inline(always)]</code>.
Note sure why that is.  Another thing (which may be related) is that
<code>p</code> is not stored in a register but rather loaded on each iteration
from the loop.  But I&#8217;m not sure how significant that is when the
effects of caching and so forth are taken into account.</p>

<p>One interesting thing is that LLVM converts the loop from one which
counts down to a loop which counts up.  It does this by first negating
<code>n</code>.  I&#8217;m not sure why this should be faster, I guess that it lets you
generate more compact instructions somehow or perhaps enables other
optimizations later on.  Can&#8217;t say I&#8217;ve ever looked into these kind of
micro-optimizations around loop counters in detail.</p>

<pre><code>Ltmp7:
    ; Initialize sum to 0:
    movq    $0, -80(%rbp)

    ; let n = vec::len(v);
    movq    -64(%rbp), %rdx
    movq    (%rdx), %rbx

    ; Compute p and store it into -48(%rbp)
    ; (Note: first argument to `unsafe::to_ptr()`
    ;  is the location to write the output)
    leaq    -48(%rbp), %rdi
    callq   __ZN3vec6unsafe8to_ptr1217_f332097e13dd07e5E

Ltmp9:
    ; Convert size from bytes into indices:
    shrq    $3, %rbx
    testq   %rbx, %rbx
    je  LBB0_12

    ; Convert counter to -n:
    negq    %rbx

    ; Zero out the sum, which will be held in %eax:
    xorl    %eax, %eax

LBB0_10:
    ; Load *p and add to the sum:
    movq    -48(%rbp), %rcx
    addq    (%rcx), %rax

    ; p++
    addq    $8, -48(%rbp)

    ; n++, stop when we reach zero:
    incq    %rbx
    jne LBB0_10

    ; Move sum from %rax into its home on the stack:
    movq    %rax, -80(%rbp)

LBB0_12:
    ...
</code></pre>

<h3>Goodbye!</h3>

<p>I hope you enjoyed this little dive into our code generation.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Serialization without type information via impls]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/25/serialization-without-type-information-via-impls/"/>
    <updated>2012-02-25T07:40:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/25/serialization-without-type-information-via-impls</id>
    <content type="html"><![CDATA[<p>My current implementation of the auto-serialization code generator
requires full type information.  This is a drag.  First, macros and
syntax extension currently run before the type checker, so requiring
full type information prevents the auto-serialization code from being
implemented in the compiler, as it should be.  At first I wanted to
change how the compiler works to provide type information, but after
numerous discussions with pcwalton and dherman, I&#8217;ve come to the
conclusion that this is a bad idea: it requires exposing an API for
the AST and for type information and introduces numerous other
complications.</p>

<p>I&#8217;ve come up with an alternative design that seems to solve this
problem.  It also addresses another concern I had: how do you allow
users to customize the (de)-serialization for a given type without
forcing them to customize (de)-serialization for all types?  One
interesting aspect of this plan, though, is that it requires
non-hygienic macros.</p>

<p>My basic plan is to allow type declarations to be decorated with a tag
like <code>#[auto_serialize]</code>, which will look something like this:</p>

<pre><code>#[auto_serialize]
type spanned&lt;T&gt; = { node: T, span: span };
</code></pre>

<p>Here I have deliberately chosen a generic type declaration to use as
my running example because (as we shall see) they are particularly
complex.  Then a pass will run in the compiler which finds all types
annotated with <code>#[auto_serialize]</code> and generates serialization and
deserialization code that live alongside the declaration.  Let&#8217;s look
first at serialization and then at deserialization: as we shall see,
the solution that we use for serialization doesn&#8217;t quite work for
deserialization, so we have to handle them slightly differently.</p>

<h2>Serialization</h2>

<p>My original concern was, without type information, how do I know how
to serialize the contents of the type?  After all, all I have is the
AST, so I know some names but that&#8217;s it.  In the case of <code>spanned&lt;T&gt;</code>,
for example, I know there are two fields, one with the type <code>T</code> and
one with the type <code>span</code>.  I can figure out that <code>T</code> is a type
parameter, but I don&#8217;t know that <code>span</code> is an import of
<code>syntax::codemap::span</code>, and I certainly don&#8217;t know that
<code>syntax::codemap::span</code> is defined as a record itself.</p>

<p>So how do I generate code to serialize a type like <code>T</code> or <code>span</code>
without knowing anything about what that type is?  It turns out that we
have a nice language tool for doing that: ifaces and impls (a.k.a.,
typeclasses).</p>

<p>So, for <code>spanned&lt;T&gt;</code>, I will generate something like:</p>

<pre><code>impl of serializable&lt;T: serializable&gt; for spanned&lt;T&gt; {
    fn serialize&lt;S: serialization::serializer&gt;(s: S) {
        s.emit_rec {||
            s.emit_rec_field("node", 0u) {||
                self.node.serialize(s);
            }
            s.emit_rec_field("span", 1) {||
                self.span.serialize(s);
            }
        }
    }
}
</code></pre>

<p>You can see that generating this code does not require any information
that is external to the type declaration.  It just assumes that, for
example, there will be a suitable implementation of the <code>serialize()</code>
method for the field <code>self.span</code>.  Similarly, by parameterizing the
<code>impl</code> with the type <code>T</code> and specifying that <code>T</code> must itself be
serializable, we can make the same assumption for the field
<code>self.node</code>.  Pretty nifty.</p>

<p>One very appealing aspect of this is that if I wanted to make custom
serialization code for the type <code>span</code>, say, I could just write my own
<code>impl</code> for the serialize method.  The auto-generated code for
serializing <code>spanned&lt;T&gt;</code> would then link to my custom code, no
problem.  Similarly, I can write custom code that uses auto-generated
code without difficulty.</p>

<h2>Deserialization</h2>

<p>However, this approach does not work for deserialization.  After all,
we can&#8217;t invoke something like <code>data.deserialize(d)</code>, as the data is
what we are trying to produce!</p>

<p>Therefore, we will generate a different pattern for deserialization.
It will look something like this:</p>

<pre><code>fn deserialize_spanned&lt;D: serialization::deserializer,T&gt;
   (d: D, t: fn(D) -&gt; T) -&gt; spanned&lt;T&gt; {

   d.read_rec {||
       {
           node: d.read_rec_field("node", 0u) {|| t() },
           span: d.read_rec_field("span", 1u) {|| deserialize_span(d) }
       }
   }
}
</code></pre>

<p>Here, we generate a <code>deserialize_X()</code> function where <code>X</code> is the
(unqualified) name of the type being deserialized. The number of
arguments expected by this <code>deserialize_X()</code> function varies: the
first argument is always a deserializer, but then there are additional
arguments for any type arguments.  These parameters are dealt with
implicitly when using ifaces and impls, but since that machinery won&#8217;t
work for us we have to thread it through manually now.</p>

<p>More interesting than the case of the field <code>node</code>, actually, is the
field <code>span</code>: here, we don&#8217;t even <em>try</em> resolve the identifier, we
just generate a dangling reference to a function <code>deserialize_span()</code>
and we assume that the user has either imported this function or
defined it locally.  This is where the lack of hygiene is required.</p>

<p>Some other cases that don&#8217;t appear here:</p>

<ul>
<li><p>if the type of a field is a path like <code>a::b::c</code>, then we generate a
call to a function like <code>a::b::deserialize_c(d)</code>.</p></li>
<li><p>if the type of a field is parameterized, like <code>spanned&lt;item_&gt;</code>, then
we generate a call like <code>deserialize_spanned(d, {||
deserialize_item_(d) })</code>, where the sugared closure <code>{||...}</code>
represents the code to unpack the type argument.</p></li>
</ul>


<h2>Feedback</h2>

<p>I am 100% positive people have solved this problem before in a million
ways, no doubt including this one.  Am I missing something obvious?
Also, would it be better to avoid using iface/impl for serialization
and just generate functions named <code>serialize_X()</code> just as I do with
<code>deserialize_X()</code>? I thought it&#8217;d be nice if the serialization were as
natural to write as possible, but I guess that if you have to write
custom serialization code, you generally need custom deserialization
code too, so it doesn&#8217;t help so much.</p>
]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Regions tradeoffs]]></title>
    <link href="http://smallcultfollowing.com/babysteps/blog/2012/02/22/regions-tradeoffs/"/>
    <updated>2012-02-22T17:35:00-08:00</updated>
    <id>http://smallcultfollowing.com/babysteps/blog/2012/02/22/regions-tradeoffs</id>
    <content type="html"><![CDATA[<p>In the last few posts I&#8217;ve been discussing various options for
regions.  I&#8217;ve come to see region support as a kind of continuum,
where the current system of reference modes lies at one end and a
full-blown region system with explicit parameterized types and
user-defined memory pools lies at the other.  In between there are
various options.  To better explore these tradeoffs, I wrote up a
document that
<a href="http://smallcultfollowing.com/babysteps/rust/regions-tradeoffs">outlines various possible schemes and also details use cases that are enabled by these schemes</a>.
I don&#8217;t claim this to be a comprehensive list of all possible schemes,
just the ones I&#8217;ve thought about so far.  In some cases, the
descriptions are quite hand-wavy.  I also think some of them don&#8217;t
hang together so well.</p>
]]></content>
  </entry>
  
</feed>

