So aturon wrote this beautiful post about what a good week it has been. In there, they wrote:

Breakthrough #2: @nikomatsakis had a eureka moment and figured out a path to make specialization sound, while still supporting its most important use cases (blog post forthcoming!). Again, this suddenly puts specialization on the map for Rust Epoch 2018.

Sheesh I wish they hadn’t written that! Now the pressure is on. Well, here goes nothing =).

Anyway, I’ve been thinking about the upcoming Rust Epoch. We’ve been iterating over the final list of features to be included and I think it seems pretty exciting. But there is one “fancy type system” feature that’s been languishing for some time: specialization. Accepted to much fanfare as RFC 1210, we’ve been kind of stuck since then trying to figure out how to solve an underlying soundness challenge.

As aturon wrote, I think (and emphasis on think!) I may have a solution. I call it the always applicable rule, but you might also call it maximally minimal specialization1.

Let’s be clear: this proposal does not support all the specialization use cases originally envisioned. As the phrase maximally minimal suggests, it works by focusing on a core set of impls and accepting those. But that’s better than most of its competitors! =) Better still, it leaves a route for future expansion.

The soundness problem

I’ll just cover the soundness problem very briefly; Aaron wrote an excellent blog post that covers the details. The crux of the problem is that code generation wants to erase regions, but the type checker doesn’t. This means that we can write specialization impls that depend on details of lifetimes, but we have no way to test at code generation time if those more specialized impls apply. A very simple example would be something like this:

impl<T> Trait for T { }
impl Trait for &'static str { }

At code generation time, all we know is that we have a &str – for some lifetime. We don’t know if it’s a static lifetime or not. The type checker is supposed to have assured us that we don’t have to know – that this lifetime is “big enough” to cover all the uses of the string.

My proposal would reject the specializing impl above. I basically aim to solve this problem by guaranteeing that, just as today, code generation doesn’t have to care about specific lifetimes, because it knows that – whatever they are – if there is a potentially specializing impl, it will be applicable.

The “always applicable” test

The core idea is to change the rule for when overlap is allowed. In RFC 1210 the rule is something like this:

  • Distinct impls A and B are allowed to overlap if one of them specializes the other.

We have long intended to extend this via the idea of intersection impls, giving rise to a rule like:

  • Two distinct impls A and B are allowed to overlap if, for all types in their intersection:
    • there exists an applicable impl C and C specializes both A and B.2

My proposal is to extend that intersection rule with the always applicable test. I’m actually going to start with a simple version, and then I’ll discuss an important extension that makes it much more expressive.

  • Two distinct impls A and B are allowed to overlap if, for all types in their intersection:
    • there exists an applicable impl C and C specializes both A and B,
    • and that impl C is always applicable.

(We will see, by the way, that the precise definition of the specializes predicate doesn’t matter much for the purposes of my proposal here – any partial order will do.)

When is an impl always applicable?

Intuitively, an impl is always applicable if it does not impose any additional conditions on its input types beyond that they be well-formed – and in particular it doesn’t impose any equality constraints between parts of its input types. It also has to be fully generic with respect to the lifetimes involved.

Actually, I think the best way to explain it is in terms of the implied bounds proposal3 (RFC, blog post). The idea is roughly this: an impl is always applicable if it meets three conditions:

  • it relies only on implied bounds,
  • it is fully generic with respect to lifetimes,
  • it doesn’t repeat generic type parameters.

Let’s look at those three conditions.

Condition 1: Relies only on implied bounds.

Here is an example of an always applicable impl (which could therefore be used to specialize another impl):

struct Foo<T: Clone> { }

impl<T> SomeTrait for Foo<T> { 
  // code in here can assume that `T: Clone` because of implied bounds
}

Here the impl works fine, because it adds no additional bounds beyond the T: Clone that is implied by the struct declaration.

If the impl adds new bounds that are not part of the struct, however, then it is not always applicable:

struct Foo<T: Clone> { }

impl<T: Copy> SomeTrait for Foo<T> { 
  // ^^^^^^^ new bound not declared on `Foo`,
  //         hence *not* always applicable
}

Condition 2: Fully generic with respect to lifetimes.

Each lifetime used in the impl header must be a lifetime parameter, and each lifetime parameter can only be used once. So an impl like this is always applicable:

impl<'a, 'b> SomeTrait for &'a &'b u32 {
  // implied bounds let us assume that `'b: 'a`, as well
}

But the following impls are not always applicable:

impl<'a> SomeTrait for &'a &'a u32 {
                   //  ^^^^^^^ same lifetime used twice
}

impl SomeTrait for &'static str {
                //  ^^^^^^^ not a lifetime parmeter
}

Condition 3: Each type parameter can only be used once.

Using a type parameter more than once imposes “hidden” equality constraints between parts of the input types which in turn can lead to equality constraints between lifetimes. Therefore, an always applicable impl must use each type parameter only once, like this:

impl<T, U> SomeTrait for (T, U) {
}

Repeating, as here, means the impl cannot be used to specialize:

impl<T> SomeTrait for (T, T) {
  //                   ^^^^
  // `T` used twice: not always applicable
}

How can we think about this formally?

For each impl, we can create a Chalk goal that is provable if it is always applicable. I’ll define this here “by example”. Let’s consider a variant of the first example we saw:

struct Foo<T: Clone> { }

impl<T: Clone> SomeTrait for Foo<T> { 
}

As we saw before, this impl is always applicable, because the T: Clone where clause on the impl follows from the implied bounds of Foo<T>.

The recipe to transform this into a predicate is that we want to replace each use of a type/region parameter in the input types with a universally quantified type/region (note that the two uses of the same type parameter would be replaced with two distinct types). This yields a “skolemized” set of input types T. When check if the impl could be applied to T.

In the case of our example, that means we would be trying to prove something like this:

// For each *use* of a type parameter or region in
// the input types, we add a 'forall' variable here.
// In this example, the only spot is `Foo<_>`, so we
// have one:
forall<A> {
  // We can assume that each of the input types (using those
  // forall variables) are well-formed:
  if (WellFormed(Foo<A>)) {
    // Now we have to see if the impl matches. To start,
    // we create existential variables for each of the
    // impl's generic parameters:
    exists<T> {
      // The types in the impl header must be equal...
      Foo<T> = Foo<A>,
      // ...and the where clauses on the impl must be provable.
      T: Clone,
    }
  }
} 

Clearly, this is provable: we infer that T = A, and then we can prove that A: Clone because it follows from WellFormed(Foo<A>). Now if we look at the second example, which added T: Copy to the impl, we can see why we get an error. Here was the example:

struct Foo<T: Clone> { }

impl<T: Copy> SomeTrait for Foo<T> { 
  // ^^^^^^^ new bound not declared on `Foo`,
  //         hence *not* always applicable
}

That example results in a query like:

forall<A> {
  if (WellFormed(Foo<A>)) {
    exists<T> {
      Foo<T> = Foo<A>,
      T: Copy, // <-- Not provable! 
    }
  }
} 

In this case, we fail to prove T: Copy, because it does not follow from WellFormed(Foo<A>).

As one last example, let’s look at the impl that repeats a type parameter:

impl<T> SomeTrait for (T, T) {
  // Not always applicable
}

The query that will result follows; what is interesting here is that the type (T, T) results in two forall variables, because it has two distinct uses of a type parameter (it just happens to be one parameter used twice):

forall<A, B> {
  if (WellFormed((A, B))) {
    exists<T> {
      (T, T) = (A, B) // <-- cannot be proven
    }
  }
} 

What is accepted?

What this rule primarily does it allow you to specialize blanket impls with concrete types. For example, we currently have a From impl that says any type T can be converted to itself:

impl<T> From<T> for T { .. }

It would be nice to be able to define an impl that allows a value of the never type ! to be converted into any type (since such a value cannot exist in practice:

impl<T> From<!> for T { .. }

However, this impl overlaps with the reflexive impl. Therefore, we’d like to be able to provide an intersection impl defining what happens when you convert ! to ! specifically:

impl From<!> for ! { .. }

All of these impls would be legal in this proposal.

Extension: Refining always applicable impls to consider the base impl

While it accepts some things, the always applicable rule can also be quite restrictive. For example, consider this pair of impls:

// Base impl:
impl<T> SomeTrait for T where T: 'static { }
// Specializing impl:
impl SomeTrait for &'static str { }

Here, the second impl wants to specialize the first, but it is not always applicable, because it specifies the 'static lifetime. And yet, it feels like this should be ok, since the base impl only applies to 'static things.

We can make this notion more formal by expanding the property to say that the specializing impl C must be always applicable with respect to the base impls. In this extended version of the predicate, the impl C is allowed to rely not only on the implied bounds, but on the bounds that appear in the base impl(s).

So, the impls above might result in a Chalk predicate like:

// One use of a lifetime in the specializing impl (`'static`),
// so we introduce one 'forall' lifetime:
forall<'a> {
  // Assuming the base impl applies:
  if (exists<T> { T = &'a str, T: 'static }) {
      // We have to prove that the
      // specialized impls type's can unify:
      &'a str = &'static str
    }
  }
} 

As it happens, the compiler today has logic that would let us deduce that, because we know that &'a str: 'static, then we know that 'a = 'static, and hence we could solve this clause successfully.

This rule also allows us to accept some cases where type parameters are repeated, though we’d have to upgrade chalk’s capability to let it prove those predicates fully. Consider this pair of impls from RFC 1210:

// Base impl:
impl<E, T> Extend<E, T> for Vec<E> where T: IntoIterator<Item=E> {..}
// Specializing impl:
impl<'a, E> Extend<E, &'a [E]> for Vec<E> {..}
               //  ^       ^           ^ E repeated three times!

Here the specializing impl repeats the type parameter E three times! However, looking at the base impl, we can see that all of those repeats follow from the conditions on the base impl. The resulting chalk predicate would be:

// The fully general form of specializing impl is
// > impl<A,'b,C,D> Extend<A, &'b [C]> for Vec<D>
forall<A, 'b, C, D> {
  // Assuming the base impl applies:
  if (exists<E, T> { E = A, T = &'b [B], Vec<D> = Vec<E>, T: IntoIterator<Item=E> }) {
    // Can we prove the specializing impl unifications?
    exists<'a, E> {
      E = A,
      &'a [E] = &'b [C],
      Vec<E> = Vec<D>,
    }
  }
} 

This predicate should be provable – but there is a definite catch. At the moment, these kinds of predicates fall outside the “Hereditary Harrop” (HH) predicates that Chalk can handle. HH predicates do not permit existential quantification and equality predicates as hypotheses (i.e., in an if (C) { ... }). I can however imagine some quick-n-dirty extensions that would cover these particular cases, and of course there are more powerful proving techniques out there that we could tinker with (though I might prefer to avoid that).

Extension: Reverse implied bounds rules

While the previous examples ought to be provable, there are some other cases that won’t work out without some further extension to Rust. Consider this pair of impls:

impl<T> Foo for T where T: Clone { }
impl<T> Foo for Vec<T> where T: Clone { }

Can we consider this second impl to be always applicable relative to the first? Effectively this boils down to asking whether knowing Vec<T>: Clone allows us to deduce that T: Clone – and right now, we can’t know that. The problem is that the impls we have only go one way. That is, given the following impl:

impl<T> Clone for Vec<T> where T: Clone { .. }

we get a program clause like

forall<T> {
  (Vec<T>: Clone) :- (T: Clone)
}

but we need the reverse:

forall<T> {
  (T: Clone) :- (Vec<T>: Clone)
}

This is basically an extension of implied bounds; but we’d have to be careful. If we just create those reverse rules for every impl, then it would mean that removing a bound from an impl is a breaking change, and that’d be a shame.

We could address this in a few ways. The most obvious is that we might permit people to annotate impls indicating that they represent minimal conditions (i.e., that removing a bound is a breaking change).

Alternatively, I feel like there is some sort of feature “waiting” out there that lets us make richer promises about what sorts of trait impls we might write in the future: this would be helpful also to coherence, since knowing what impls will not be written lets us permit more things in downstream crates. (For example, it’d be useful to know that Vec<T> will never be Copy.)

Extension: Designating traits as “specialization predicates”

However, even when we consider the base impl, and even if we have some solution to reverse rules, we still can’t cover the use case of having “overlapping blanket impls”, like these two:

impl<T> Skip for T where T: Read { .. }
impl<T> Skip for T where T: Read + Seek { .. }

Here we have a trait Skip that (presumably) lets us skip forward in a file. We can supply one default implementation that works for any reader, but it’s inefficient: it would just read and discard N bytes. It’d be nice if we could provide a more efficient version for those readers that implement Seek. Unfortunately, this second impl is not always applicable with respect to the first impl – it adds a new requirement, T: Seek, that does not follow from the bounds on the first impl nor the implied bounds.

You might wonder why this is problematic in the first place. The danger is that some other crate might have an impl for Seek that places lifetime constraints, such as:

impl Seek for &'static Foo { }

Now at code generation time, we won’t be able to tell if that impl applies, since we’ll have erased the precise region.

However, what we could do is allow the Seek trait to be designated as a specialization predicate (perhaps with an attribute like #[specialization_predicate]). Traits marked as specialization predicates would be limited so that every one of their impls must be always applicable (our original predicate). This basically means that, e.g., a “reader” cannot conditionally implement Seek – it has to be always seekable, or never. When determining whether an impl is always applicable, we can ignore where clauses that pertain to #[specialization_predicate] traits.

Adding a #[specialization_predicate] attribute to an existing trait would be a breaking change; removing it would be one too. However, it would be possible to take existing traits and add “specialization predicate” subtraits. For example, if the Seek trait already existed, we might do this:

impl<T> Skip for T where T: Read { .. }
impl<T> Skip for T where T: Read + SeekPredicate { .. }

#[specialization_predicate]
trait UnconditionalSeek: Seek {
  fn seek_predicate(&self, n: usize) {
    self.seek(n);
  }
}

Now streams that implement seek unconditionally (probably all of them) can add impl UnconditionalSeek for MyStream { } and get the optimization. Not as automatic as we might like, but could be worse.

Default impls need not be always applicable

This last example illustrates an interesting point. RFC 1210 described not only specialization but also a more flexible form of defaults that go beyond default methods in trait definitions. The idea was that you can define lots of defaults using a default impl. So the UnconditionalSeek trait at the end of the last section might also have been expressed:

#[specialization_predicate]
trait UnconditionalSeek: Seek {
}

default impl<T: Seek> UnconditionalSeek for T {
  fn seek_predicate(&self, n: usize) {
    self.seek(n);
  }
}

The interesting thing about default impls is that they are not (yet) a full impl. They only represent default methods that real impls can draw upon, but users still have to write a real impl somewhere. This means that they can be exempt from the rules about being always applicable – those rules will be enforced at the real impl point. Note for example that the default impl above is not always available, as it depends on Seek, which is not an implied bound anywhere.

Conclusion

I’ve presented a refinement of specialization in which we impose one extra condition on the specializing impl: not only must it be a subset of the base impl(s) that it specializes, it must be always applicable, which means basically that if we are given a set of types T where we know:

  • the base impl was proven by the type checker to apply to T
  • the types T were proven by the type checker to be well-formed
  • and the specialized impl unifies with the lifetime-erased versions of T

then we know that the specialized impl applies.

The beauty of this approach compared with past approaches is that it preserves the existing role of the type checker and the code generator. As today in Rust, the type checker always knows the full region details, but the code generator can just ignore them, and still be assured that all region data will be valid when it is accessed.

This implies for example that we don’t need to impose the restrictions that aturon discussed in their blog post: we can allow specialized associated types to be resolved in full by the type checker as long as they are not marked default, because there is no danger that the type checker and trans will come to different conclusions.

Thoughts?

I’ve opened an internals thread on this post. I’d love to hear whether you see a problem with this approach. I’d also like to hear about use cases that you have for specialization that you think may not fit into this approach.

Footnotes

  1. We don’t say it so much anymore, but in the olden days of Rust, the phrase “max min” was very “en vogue”; I think we picked it up from some ES6 proposals about the class syntax.

  2. Note: an impl is said to specialize itself.

  3. Let me give a shout out here to scalexm, who recently emerged with an elegant solution for how to model implied bounds in Chalk.