Associated items continued
I want to [finish my discussion of associated items][pp] by taking a look at how they are handled in Haskell, and what that might mean in Rust.
These proposals have the same descriptive power as what I described before, but they are backwards compatible. This is nice.
Object-oriented style name resolution
In the object-oriented, C++-like version of associated items that I introduced before, the names of associated items and methods were resolved relative to a type. To see what I mean by this, consider a (slightly expanded) variant the graph example I introduced before:
trait Graph {
type Node; // associated type
static K: uint; // associated constant
}
mod graph {
fn depth_first_search<G: Graph>(
graph: &G) -> ~[G::Node]
{
let k = G::K;
...
}
}
Consider a path like graph::depth_first_search
, which names an item
within a module. This kind of path is based solely on the module
hierarchy and can be resolved without knowing anything types
whatsoever.
Now consider the paths G::Node
and G::K
that appear in
depth_first_search
. These paths look similar to in form to
graph::depth_first_search
, but they are resolved quite
differently. Because G
is a type and not a module, if we want to
figure out what names Node
and K
refer to, we don’t examine the
module hierarchy but rather the properties of the type G
. In this
case, G
is a type parameter that implements the Graph
trait, and
the Graph
trait defines an associated type Node
and an associated
constant K
.
Note that the name lookup process here is exactly analogous to what
happens on a method call. With an expression like a.b()
, the meaning
of the name b
is resolved by first examining the type of the
expression a
and then checking to see what methods that type
offers. The module hierarchy is not consulted.
The object-oriented style of naming specification is not fully
explicit. In particular, a path like G::Node
does not specify the
trait in which Node
was defined, the compiler must figure it out.
It is also possible that the type G
implements multiple traits that
have an associated item Node
, so the syntax could be ambiguous. To
make things fully explicit, I proposed in my previous post that the
full syntax would be Type::(Trait::Item)
. So the fully explicit
form of G::Node
would be G::(Graph::Node)
, since the type Node
is defined in the trait Graph
.
Functional-style name resolution (take 1)
In Haskell, all name resolution is done based on lexical scoping and the module hierarchy. This is no accident. It means that the Haskell compiler can figure out what each name in a program refers to without knowing anything about the types involved, which is helpful when performing aggressive type inference.
What this means is that we can’t use a path like G::Node
to mean
“the type Node
relative to the type G
”, because interpreting this
path would require examining the definition of the type G
(as we saw
before). Instead, if we were to use a syntax analogous to what Haskell
uses, one would write something like Graph::Node<G>
. Note that all
the names here (Graph::Node
, G
) can be resolved using only the
module hierarchy.
So the example I gave before would look as follows:
trait Graph {
type Node; // associated type
static K: uint; // associated constant
}
mod graph {
fn depth_first_search<G: Graph>(
graph: &G) -> ~[Graph::Node<G>]
{
let k = Graph::K::<G>;
...
}
}
Note that where before we wrote G::K
to refer to the constant K
associated with the type G
, we would now write Graph::K::<G>
. As
is typical in Rust, the extra ::
that appears before the type
parameter <G>
is necessary to avoid parsing ambiguities when the
path appears as part of an expression.
Let’s look a bit more closely at what’s going on here. Effectively
what is happening is that, for each associated item within a trait, we
are adding a synthetic type parameter. For any reference to an
associated item, this type parameter tells the compiler which type is
implementing the trait. The path Graph::Node
by itself is not
complete; Graph::Node<G>
means “the type Node
defined for the type
G
”.
Let’s dig into it this Haskell-style convention a bit to see some of the implications.
Return type inference
One benefit of the Haskell style convention is that the values for
the type parameters can often be deduced by inference. For example,
let’s return to the trait FromStr
from my [previous post][pp].
The trait FromStr
is used to parse a string and produce a value of
some other type:
trait FromStr {
fn parse(input: &str) -> Self;
}
We might implement FromStr
for unsigned integers as follows:
impl FromStr for uint {
fn parse(input: &str) -> uint {
uint::parse(input, 10) // 10 is the radix
}
}
Now we could write a function that invokes parse()
like so:
fn parse_strings(v: &[&str]) -> ~[uint] {
v.map(|s| FromStr::parse(*s))
}
Note that when we called FromStr::parse(*s)
, we did not say what
type it should parse to. The compiler was able to infer that we wanted
to parse a string into a uint
based on the return type of
parse_strings()
as a whole. A fully explicit version of parse_strings
would look like:
fn parse_strings(v: &[&str]) -> ~[uint] {
v.map(|s| FromStr::parse::<uint>(*s))
// ^~~~~~ specify return type
}
Generic traits
Imagine that we have a generic trait, like this Add
trait:
trait Add<Rhs> {
type Sum;
fn add(&self, r: &Rhs) -> Sum<Self, Rhs>;
}
This trait is very similar to the trait used in Rust to implement
operator overloading, except it has been adapted to use an associated
type for the Sum
(which is probably how the Rust type should be
defined as well, since the type of the sum ought to be determined by
the types of the things being added).
Previously, with the associated type Node
, we said that any
reference to node had to include a single type parameter to indicate
the type that was implementing Graph
. But with a generic trait like
Add
a simple type parameter is not enough. To fully specify all the
types involved, we need to include both the Self
type and any type
parameters. This is why the return type of the method add()
is
Sum<Self, Rhs>
—a mere reference to Sum
or Sum<Self>
would be incomplete.
Comparison to object-oriented form. Interestingly, this case is
something that the object-oriented style of naming cannot handle very
well. This is because the object-oriented convention is strongly
oriented towards specifying the Self
type but does not easily expand
to accommodate generic traits. Using the fully explicit syntax that I
suggested in my [previous post][pp], I think the result would look
like Self::(Add<Rhs>::Sum)
. (It’s plausible that, within the trait
definition itself, one could simply write Sum
, but from outside the
trait definition I think it would be necessary to specify the full
type parameters).
Generic items
It is also possible to have an associated item which itself has type parameters. For example, we might want to have a graph where kind of node can carry its own userdata:
trait Graph {
type Node<B>;
fn get_node_userdata<B>(n: &Node<Self, B>) -> B;
}
Here we see that when we refer to the Node
type in
get_node_userdata
, we specify both the Self
type parameter and the
type parameters defined on Node
itself. I think this is a bit
surprising.
Comparison to object-oriented form. The object-oriented naming
scheme handles this case very naturally. For example, get_node_userdata()
would be declared as follows:
fn get_node_userdata<B>(n: &Self::Node<B>) -> B;
Functional-style name resolution (take 2)
In the previous section we added implicit type parameters to each associated item. Particularly in the cases of generic traits or generic items, this can be a bit confusing. You wind up mixing type parameters that were declared on the trait together with type parameters declared on the item.
An alternative that would be a bit more explicit is to (1) designate
the implicit parameter for the trait’s Self type using a special
keyword, such as for
or self
(I prefer for
since it echoes the
impl Trait for Type
form) and (2) push the trait type parameters
into the path itself. So instead of writing Graph::Node<G>
you
would write Graph::Node<for G>
, and instead of Add::Sum<Lhs, Rhs>
you would write Add<Rhs>::Sum<for Lhs>
. You’ll see more examples of
how this looks in the next section.
Conclusion: Comparing the conventions
I think none of these conventions is perfect. Each has cases where it is a bit counterintuitive or ugly. To try and make the comparison easier, I’m going to create a table summarizing the object-oriented, functional 1, and functional 2 styles, and show how each syntax looks for each of the use cases I identified in this post. For each use case, I’ll provide both the shortest possible form and the fully explicit variant.
G::Node | Node<G> | Node<for G> |
G::(Graph::Node) | Graph::Node<G> | Graph::Node<for G> |
G::K | K::<G> | K::<for G> |
G::(Graph::K) | Graph::K::<G> | Graph::K::<for G> |
uint::parse() | parse() | parse() |
uint::(Graph::parse()) | FromStr::parse::<uint>() | FromStr::parse()::<for uint> |
Self::(Add<Rhs>::Sum) | Sum<Self,Rhs> | Add<Rhs>::Sum<for Self> |
Self::(Add<Rhs>::Sum) | Add::Sum<Self,Rhs> | Add<Rhs>::Sum<for Self> |
Self::Node<B> | Node<Self,B> | Node<B for Self> |
Self::(Graph::Node<B>) | Graph::Node<Self,B> | Graph::Node<B for Self> |