GCC Rust Monthly Report #12 November 2021

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Milestone Progress

November was a busy month where we added lang item’s and operator overloading, particularly dereference operator overloading, which is critical in terms of a control flow for method resolution. The remaining tasks are merging enum code generation and initial pattern matching support for enum access. Pattern matching will be an ongoing task after this milestone since it requires a lot of static analysis in the match arms, but it is not necessary to generate code. The exciting thing about enums is that we can use the QUAL_UNION_TYPE from GCC, which is only used in the Ada front-end, a kind of GCC specific tree similar to a tagged union you might write in C. Ada is an interesting front-end as it also contains many similarities to Rust in that it even contains overflow traps: https://godbolt.org/z/5oTh79hnx

I was trying to close out this milestone on 3rd December 2021 but have missed this target, so it will likely be 8th December 2021 when I finish and merge the lesser cleanup’s on the enum and pattern matching work. However, this still means we have won back nearly two weeks of the lost time from the previous traits milestone.

To close out the year, I will be testing GCC Rust in anger on some actual rust code (https://github.com/Rust-GCC/gccrs/issues/682), finding bugs as early as possible is crucial to the project timeline. Since we don’t support macro’s yet, Philipp Krones has kindly been porting this project to compile with no_std and no_core; this means porting in all the lang items from libcore required for this project. So, in turn, it means we are testing much of the functionality from libcore, which is exciting. So when we finish the next milestone of macros and cfg expansion, we should be getting very close to compiling and using libcore.

Thank you to everyone who continues to support and work on the compiler.

Monthly Community Call

We had our 9th community call on 3rd December 2021 and you can find the meeting notes here: https://github.com/Rust-GCC/Reporting/blob/main/2021-12-03-community-call.md

Huawei – Modern Compilers and Languages Technologies 2021 Conference

Philip will be giving a talk with Philipp Krones and Jeremy Bennet on “The Rust Ecosystem and GCC-rs” on 10th December 2021.

Completed Activities

  • Tag ‘rust_fatal_error’ as ‘noreturn’ PR777 PR780
  • Add location info in AST::TypeBoundWhereClauseItem and HIR::TypeBoundWhereClauseItem PR778
  • Add new rust_internal_error for specific ICE’s PR779
  • Merge Type Checking code for enums PR781
  • Refactor TyTy::ADTType into one which can contain multiple variants PR781
  • Get rid of lambda in AST::TypePath PR783
  • Track inline module scopes for path module imports PR785
  • Save make check-rust artifacts for GHA PR787
  • Add new -frust-crate option to specify crate name PR788
  • Get rid of lambda TyTy::TupleType iterate fields PR791
  • Change DefId from uint64_t with bitmask into a struct PR792
  • Fix unhandled TypeBounds PR794
  • Documentation for clang-format usage PR795 PR802 PR803
  • Handle forward declared items within blocks PR796
  • Fix some missing cases of constant folding PR798
  • Merge LangItems work and operator overloading support PR801
  • Remove Btype, Bexpression, etc. abstractions over gcc trees PR805
  • Fix MethodCalls for covariant impl blocks PR810
  • Remove implicit name hack for trait associated types PR811
  • Deref Operator Overloading support PR818 PR821 PR823
  • BugFix QualifiedPath’s within traits PR812 PR813
  • BugFix name mangling on QualifiedPaths PR819
  • BugFix mutability within the type system for reference and pointer types PR820 PR817
  • GCC requires TREE_ADDRESSABLE on declarations that require address operations PR814
  • Cleanup generic substitution code PR822

Contributors this month

Overall Task Status

CategoryLast WeekThis WeekDelta
TODO9993-6
In Progress1214+2
Completed234251+17
GitHub Issues

Testcases

CategoryLast WeekThis WeekDelta
Passing48445337+493
XFAIL2121
make check-rust

Bugs

CategoryLast WeekThis WeekDelta
TODO2224+2
In Progress34+1
Completed8490+6
GitHub Bugs

Milestone Progress

MilestoneLast WeekThis WeekDeltaStart DateCompletion DateTarget
Data Structures 1 – Core100%100%30th Nov 202027th Jan 202129th Jan 2021
Control Flow 1 – Core100%100%28th Jan 202110th Feb 202126th Feb 2021
Data Structures 2 – Generics100%100%11th Feb 202114th May 202128th May 2021
Data Structures 3 – Traits100%100%20th May 202117th Sept 202127th Aug 2021
Control Flow 2 – Pattern Matching55%%97+42%20th Sept 202129th Nov 2021
Macros and cfg expansion0%0%1st Dec 202128th Mar 2022
Imports and Visibility0%0%29th Mar 202227th May 2022
Const Generics0%0%30th May 202225th Jul 2022
Intrinsics0%0%6th Sept 202130th Sept 2022
GitHub Milestones

Risks

RiskImpact (1-3)Likelihood (0-10)Risk (I * L)Mitigation
Rust Language Changes3721Keep up to date with the Rust language on a regular basis
Going over target dates3515Maintain status reports and issue tracking to stakeholders

Planned Activities

Detailed changelog

Refactor TyTy::ADTType

In order to support enums, we could have implemented a new TyTy module for this, to then subsequently update the typechecking code. But in a lot of ways enum’s are just another type of algebraic data type which also has the side effect of canonicalizing the flow of how we work with these types instead of inventing new paths in the compiler. The change here was that the ADT Type was originally designed for unit-structs, structs and tuple structs. But really an enum is an ADT with multiple variants and structs, tuple structs are ADT’s with a single variant and finally, a unit-struct is one with no variants. This change was rather large to decouple and refactor ADT’s into variants but has helped tackle some technical debt along the way. Thanks to flip1995 for pointing us in this direction.

Add new rust_internal_error

This new API is designed to distinguish between internal compiler errors and actual program errors. Assertions are good but sometimes you want to be able to provide extra contextural information.

Handle forward declared items within blocks

Rust allows for Items such as functions to be declared at the bottom of a BlockExpr which can be referenced at any point within that conxtext such as: https://godbolt.org/z/PGqnz1nve

pub fn main() {
    let a;
    a = foo { a: 123, b: 456f32 };

    let mut a = 123;
    a = bar(a);

    let mut b = 456f32;
    b = bar(b);

    let aa = X;

    let bb:[i32; X];

    fn bar<T>(x: T) -> T {
        x
    }

    struct foo {
        a: i32,
        b: f32,
    };

    const X:usize = 2;
}

Fix unhandled TypeBounds

This test case within the rust testsuite https://github.com/rust-lang/rust/blob/d5a0c7cb036032288a4a5443b54ba061ec12ee26/src/test/ui/higher-rank-trait-bounds/hrtb-fn-like-trait-object.rs raised bugs with unhandled type bounds for TypeAliases:

type FnObject<'b> = dyn for<'a> FnLike<&'a isize, &'a isize> + 'b;

The bug here was that we had two ways in HIR to represent bounds here, and we had an opportunity to desugar our AST from two types:

  • AST::TraitObjectType
  • AST::TraitObjectTypeOneBound

Into a single HIR::TraitObjectType

Operator Overloading

Rust supports operator overloading of many different operations, we have added support for all the regular arithmetic operators (+,-,*,/,%), compound assignments such as (+=, …), the unary negation operators (!x, and -x). There is support for the deref operations but there are a few bugs to work though there to get it right. Since we cannot compile libcore yet we require you to define the lang items you want to use within your crate, we have taken the same traits from libcore to be sure that we can compile them correctly.

extern "C" {
    fn printf(s: *const i8, ...);
}

#[lang = "add"]
pub trait Add<Rhs = Self> {
    type Output;

    fn add(self, rhs: Rhs) -> Self::Output;
}

impl Add for i32 {
    type Output = i32;

    fn add(self, other: i32) -> i32 {
        self + other
    }
}

struct Foo(i32);
impl Add for Foo {
    type Output = Foo;

    fn add(self, other: Foo) -> Foo {
        Foo(self.0 + other.0)
    }
}


fn main() {
    let res;
    res = Foo(1) + Foo(2);

    unsafe {
        let a = "%i\n\0";
        let b = a as *const str;
        let c = b as *const i8;

        printf(c, res.0);
    }
}

The purpose of this test case is to ensure that when we add the Foo structure together that it will break down into calling the operator overload for i32. It should be noted that when you turn on optimizations these function calls are fully inlined just like C++ operator overloads.

See compiler explorer for more information https://godbolt.org/z/95bc4eWPW

Covariant Self’s within impl blocks

Impl blocks on rust support all types without bounds. Which mens the specified Self type for an impl block in this examples case is a reference to a generic type parameter. This means when we do a method call we must handle this case. Method resolution breaks down into two phases, the candidate probe then the actual resolution from the candidates. The first is about searching for a function named correctly which matches the impl block Self type then we use the autoderef mechanism to match the self parameter to find the candidate. Probing for candidates in method calls is a little bit unclear to me yet, but I believe the correct mechanism is meant to be looking for any impl block with a function named correctly then check via autodref if our receiver can be autoderef’d to the impl blocks implicit Self type to find all the potential candidates. Then we autoderef on small self.

pub trait Foo {
    type Target;

    fn bar(&self) -> &Self::Target;
}

impl<T> Foo for &T {
    type Target = T;

    fn bar(&self) -> &T {
        *self
    }
}

pub fn main() {
    let a: i32 = 123;
    let b: &i32 = &a;

    b.bar();
}

Remove GCC abstraction types

  • The goal of GCC Rust has always been to make a GCC quality front-end for Rust.
    • This means this goal comes first before any long term goal of porting the code to new compiler platforms etc.
  • The GCC IR is very suitable for further static analysis, and the abstractions will make this very awkward.
    • In the long term, we could potentially look at building a borrow checker at the GENERIC tree level, which might have some interesting code to share with wider GCC.
  • Constant Folding
    • Const Generics will be very awkward until this is removed.
    • The abstraction requires features such as constant Folding, which does not fit very well right now.

So overall removing the abstraction is going to make some things much easier to work with. For example, in the short term, code generation for unions/ADTs/match-expr could be simplified a lot if we remove this. It also might help attract more GCC people to work with the backend code generation piece to clean up the code here.

See: https://github.com/Rust-GCC/gccrs/issues/412

GCC TREE_ADDRESSABLE

GCC requires VAR_DECL’s and PARAM_DECL’s to be marked with TREE_ADDRESSABLE when the declaration will be used in borrow’s (‘&’ getting the address). This takes into account the implicit addresses when we do autoderef in method resolution/operator-overloading. TREE_ADDRESSABLE if not set allows the optimizers to use registers since no address in memory is required for this declaration, but this means we end up in cases like this:

#[lang = "add_assign"]
pub trait AddAssign<Rhs = Self> {
    fn add_assign(&mut self, rhs: Rhs);
}

impl AddAssign for i32 {
    fn add_assign(&mut self, other: i32) {
        *self += other
    }
}

fn main() {
    let mut a = 1;
    a += 2;
}

This generated GCC Generic IR such as:

i32 main ()
{
  i32 a.1; // <-- This is the copy
  i32 D.86;
  i32 a;

  a = 1;
  a.1 = a; // <-- Taking a copy

  <i32 as AddAssign>::add_assign (&a.1, 2);
  //                               ^
  //                              ----

  D.86 = 0;
  return D.86;
}

You can see GCC will automatically make a copy of the VAR_DECL resulting bad code generation. But with the TREE_ADDRESSABLE set this looks like this:

i32 main ()
{
  i32 D.86;
  i32 a;

  a = 1;
  <i32 as AddAssign>::add_assign (&a, 2);
  D.86 = 0;
  return D.86;
}

The fix here now marks the declarations appropriately for when we need their address or not which then allows the GCC optimizers to work as we expect. For more info see this useful comment https://github.com/Rust-GCC/gccrs/blob/0024bc2f028369b871a65ceb11b2fddfb0f9c3aa/gcc/tree.h#L634-L649

Qualified Path BugFix

We found that the implementation of qualified paths in was reliant on some implicitly injected names within the name-resolution process so that we could try and at least resolve the root of the qualified path. This implementation was never going to hold up but served as a simple hack to get the type system off the ground during the traits milestone. These hacks and implicit names are now removed and qualified paths are now just like TypePaths resolved during the type checking pass. The bug here was that the qualified path of “<Self as Foo>::A” was unable to resolve the root “<Self as Foo>” since no implicit name was generated here, but now the type system is able to properly project Self as Foo to then probe for A which means the type system is able to handle more complex qualified paths.

pub trait Foo {
    type A;

    fn boo(&self) -> <Self as Foo>::A;
}

fn foo2<I: Foo>(x: I) {
    x.boo();
}

Add implicit indirection to array access

When we have an array-index expr rust allows the array to be a fat-pointer reference and the compiler is required to add in the required implicit indirection. Note: Rust supports this implict indirection in tuple and struct access also.

fn foo(state: &mut [u32; 16], a: usize) {
    state[a] = 1;
}

Support Dereference operator overloading

Deref operator overloading is a core piece of Rusts control flow mechanism, it adds in support for more complex method resolution cases as part of the autoderef mechanism. It also has served as a good test of the current state of the type system so far.

extern "C" {
    fn printf(s: *const i8, ...);
}

#[lang = "deref"]
pub trait Deref {
    type Target;

    fn deref(&self) -> &Self::Target;
}

impl<T> Deref for &T {
    type Target = T;

    fn deref(&self) -> &T {
        *self
    }
}

impl<T> Deref for &mut T {
    type Target = T;

    fn deref(&self) -> &T {
        *self
    }
}

struct Foo<T>(T);
impl<T> Deref for Foo<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

fn main() -> i32 {
    let foo: Foo<i32> = Foo(123);
    let bar: i32 = *foo;

    unsafe {
        let a = "%i\n\0";
        let b = a as *const str;
        let c = b as *const i8;

        printf(c, bar);
    }

    0
}

The interesting piece about dereferences is that the actual deref method that is implemented always returns a reference to the associated type ‘Target’, this implicitly requires the compiler call this method and because the trait and type checking ensures that the result is a reference it means it can safely be dereferenced by the compiler implicitly. I point this out because simply because the function prototype:

fn deref(&self) -> &Self::Target {
    &self.0
}

Here the function type is:

fn deref(self: &Foo<T>) -> &T { &self.0 }

So the dereference operation even on custom types is always going to return a reference. So the dereference operator overloading is a two step mechanism.

2 thoughts on “GCC Rust Monthly Report #12 November 2021

  1. Hi,

    > Huawei – Modern Compilers and Languages Technologies 2021 Conference
    >
    >Philip will be giving a talk with Philipp Krones and Jeremy Bennet on “The Rust Ecosystem and >GCC-rs” on 10th December 2021.

    Do you have some more information on this conference? Is it public/online? I couldn’t find any information on it.

    1. Hi Jonathan at the moment I don’t have much info on this but I have CC’d you into an email with one of the organizers who is also from Huawei so you may be able to join since you both work for the same company 😀

Leave a Reply to philbert Cancel reply

Your email address will not be published.