GCC Rust in 2021

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Overview

GCC Rust is a project dating back to 2014, when Rust was still in flux. It became tough to keep up with everything, so the project stalled out. It wasn’t until early 2019 I joined forces with Joel, who wrote a new lexer, parser and AST, which vastly helped get the project off the ground again. During 2019 we started making steady progress getting the project going again in terms of getting the compile pipeline setup and initial proof of concepts on some type checking and code generation.

Come late 2020, I went full time with the sponsorship from Open Source Security, inc and Embecosm. With the mutual connections in the GCC Steering Committee, I have been fortunate to be surrounded by so much experience to keep me right and the chance to work with such talented people. Of course, there is always the risk of how much you don’t know in large projects, so to mitigate this, I have tried to keep milestones as short as possible with clear goals of the Rust code we want to support at each phase. My personal aim has always been to maximize the funding and trust as best I can from those who sponsor me to the community that this compiler is part of. This is why my status reports are freely available.

My personal motivation has always been that Rust is a fascinating language primarily because of the Type System and modern features like the Match Expression being available in a native language without a garbage collector. The project initially started because I love working with GCC front-end code, and it was a proof of concept to try and clean the language in deep detail.

Progressively it began to evolve into an idea that we could have someday a complete GCC quality front-end for Rust. The Rust for Linux project seems more a question of time than Rusts capabilities at this point. Upon merging the Rust support into the Linux kernel, it will mean a few issues will arise:

Linux is compiled mainly with GCC, but the Rustc toolchain is LLVM based
Since its inception, it will not be possible to build Linux with a full GPL-based toolchain for the first time.
GCC Plugins within the kernel tree
Mixed binaries (GCC vs LLVM) will lose the support of both: CFI, LTO

The Rust for Linux project is a major for Linux, in my opinion, regardless of GCC Rust. Though I believe GCC Rust will fill in these gaps and provide some more benefits.

Eliminate the mixed binary problem as this is a full GCC based toolchain for Rust.
- https://www.cs.ucy.ac.cy/~eliasathan/papers/tops20.pdf
Provide support for GCC plugins on Rust code
GCC Plugins access
- Many security passes are based on GCC plugins
GPL based toolchain
GCC Rust will have a slower pace of change than Rustc
Support all GCC backends
Gain access to all GCC flags and optimizers
Provide contracts to LLVM in benchmarks
Investment into the GCC toolchain is preserved
- Vendors already invested in GCC will gain GCC support automatically
- Backporting the front-end onto older GCC’s
Architectures support

Thanks to the matthieum (https://www.reddit.com/r/rust/comments/racv8r/gcc_rust_monthly_report_12_november_2021/hnm99pb/?context=3) for these additional points:

A GCC front-end grants instantaneous recognition to a language.
GCC is the default compiler that most Linux distributions ship. This lowers the bar for Rust adoption.
It’s easier to convince distribution maintainers to enable another front-end by default in a project than to set up another toolchain.

Overall we maintain that GCC Rust is not in competition with Rustc but will live alongside Rustc as an alternative that people may prefer depending on their situation or requirements. However, it must also be said we do not wish to be a mechanism to bypass the Rust RFC process for language changes.

Moving forward into next year, we again hope to be part of Google Summer of Code 2022 and to push forward and compile the Rust standard library and begin testing the Rustc test suite this time next year.

Thanks

Thanks so much to all the contributors of GCC Rust, precisely:

Brad Spengler from Open Source Security, inc.
Jeremy Bennett from Embecosm.
David Edelsohn from GCC steering committee.

You have made 2021 a fantastic year for me in particular, and I aim to continue working hard to maximise your faith in this project. Each of you has decades of experience, which is vital for the project’s future success.

Achievements

GSOC 2021

Even though GCC Rust is still in an early development phase, I give regular updates to the GCC steering committee, which has provided two major benefits.

Getting project advice from those who have decades of experience
Providing the privilege of being part of Google Summer of Code 2021

Two student proposals stood out to me:

Dead Code Analysis
Cargo integration

The dead-code analysis is a crucial pass in any language that helps clean up code. Although the compiler is not complete, Thomas Young has provided a solid framework to enhance over time. When people use Rust, they rarely invoke the compiler directly but use the Rust build tool called Cargo. This tool is a key component that we must have as part of our toolchain. Arthur had a clever proposal to take advantage of the internals of Cargo to add gccrs as a subcommand.

See: https://summerofcode.withgoogle.com/archive/2021/organizations/5653860256841728

Talks

Rust Bash: https://youtu.be/Gm6gw1fqMwA?t=4035
BCS and Rust London: (used to be on youtube)
LPC 2021: https://www.youtube.com/watch?v=chs9LxT9PAg
Huawei – Modern Compilers and Languages Technologies 2021 Conference: Private

Technical

Below are just some of the technical achievements I wish to write about.

Unions and Enums

Mark Wielaard has been an active contributor through 2021 and successfully navigated an early code base, which was not an easy task to give us union support within the compiler. This was just one of those occasions that made my week. Knowing that the compiler code was at least navigatable by others. He was able to take this forward into enums which were big tasks involving a refactor of our Algebraic data type system to support multiple variants. Still, he was able to do a great job. Thanks, Mark.

Module’s

We have another Marc who joined in the craic early on. One of his major contributions has been the module keyword in Rust. This complex change involved hitting many issues with our Path resolution system. It was still pretty naive early on but tackling this rattled out the bugs pretty early, which is key to keeping to a tight timeline. He is also the driving force behind adding all the alternative Rust implementations onto compiler explorer and rapidly fixing any bugs, and supporting any new contributors to GCC Rust in general. Thanks, Marc.

GDB debugging

Upon starting this project, we tried our best to track location info from the get-go within the AST. This is then preserved during HIR lowering. When creating our gimple, the location info enhances all types and statement/expression info, so we already have working GDB support. However, there are many gaps and places where we can improve this from missing location info and getting proper rust gdb integration.

(gdb) start
Temporary breakpoint 1 at 0x40126a: file ../gccrs/gcc/testsuite/rust/compile/torture/generics9.rs, line 14.
Starting program: /home/philbert/workspace/gcc/gccrs-build/test 
Temporary breakpoint 1, main () at ../gccrs/gcc/testsuite/rust/compile/torture/generics9.rs:14
14          let a: GenericStruct<i32> = GenericStruct::<i32>::new(123, 456);
(gdb) n
15          let aa: i32 = a.get();
(gdb) p a
$1 = {0: 123, 1: 456}
(gdb) n
18          let b: GenericStruct<u32> = GenericStruct::<_>::new(123, 456);
(gdb) s
TestCrate::GenericStruct<u32>::new (a=123, b=456) at ../gccrs/gcc/testsuite/rust/compile/torture/generics9.rs:5
5               GenericStruct(a, b)

We have had help from GDB developers to get our integrations spot on such as the tuple structs where fields must be prefixed with a double underscore for example.

Hello World

Hello World is always a vital piece of any milestone of a compiler for GCC Rust we have not implemented macro’s yet; however, we have taken advantage of unsafe blocks calling into the C ABI printf. This was key so we could enhance our test suite with execution tests to match output.

Traits and dynamic dispatch

Type-bound and dynamic dispatch were both aspects of Rust I was anxious about because it involves virtual dispatch, and the second requires compile-time resolution. These are key things to get right moving forward in the type system and code generation.

fn static_dispatch<T: Bar>(t: &T) {
    t.baz();
}

fn dynamic_dispatch(t: &dyn Bar) {
    t.baz();
}

fn main() {
    let a;
    a = &Foo(123);

    static_dispatch(a);
    dynamic_dispatch(a);
}

see: https://godbolt.org/z/71vhWsY4z

Road Bumps

Below are some of Rust’s more exciting parts that have made me look twice before implementing them. Some of which I think could benefit from more detailed documentation.

Tuple Struct initialisation

struct Foo(i64 bool);
 
pub fn main() {
   let a;
   a = Foo{0: 123, 1: true};
}

We hit this test case early on when trying different constructor variants for ADT’s; TupleStructs are usually constructed using the regular Tuple syntax, though this test case demonstrates how the compiler implicitly creates implicit field names of the index. So it is almost an implementation detail that this works the way it does rather than being specified in the syntax of the language, which is pretty neat.

see: https://godbolt.org/z/PEGEWae4f

Qualified Paths

As part of my work into traits, this includes qualified paths. I have been investigating some test cases around qualified paths, and this one had me confused: https://github.com/rust-lang/rust/blob/master/src/test/ui/qualified/qualified-path-params-2.rs.

The associated path <S as Tr>::A on its own will resolve to unit-struct S so when the final segment of ::f<u8> i would have assumed this would have resolved to the type of the impl function f substituted with u8. I don’t see why this is ambiguous. I have, however, received an explanation on the Rust Zulip server https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/Ambiguous.20associated.20types/near/251210283.

This is interesting since associated types are not supported on impl blocks; the rustc compiler will enforce that all associated types must come from QualifiedPaths. This is to force the path to always be of the form <A as B>::C. When inherently associated types are stabilized, this will need some work. The GCC Rust implementation currently ICE’s for this test case see https://github.com/Rust-GCC/gccrs/issues/843.

See: QualifiedPathInType

Never Type

Rust has a never type which is pretty interesting. In c/c++, you can use the attribute no_return to signify that this function is never going to return. In Rust, they take this concept to its full with the never type, so for example, below:

fn test() {
   let a = return;
   let b = a + 123;
   a = 456;
}

You can see ‘a’ is equal to the return expression, which will, in this context, never return. So even though everything after this, the first assignment will never get executed because of the return expression, Rust can fully type this function. The functionality here is down to a key piece of the type system, which is more evident in closures than here. Still, in the above ‘b = a + 123’, the types here never type plus an integer inference variable, as there is no implementation of never type plus integer, here Rust actually returns another inference variable with a specified type-bound that requires the ability to add numbers together. This means the final assignment allows the type system to know that ‘a’ is an integer, albeit an uninitialized one, but this code is still safe because of the return expression.

Method resolution

Method resolution is a multi-variant task; it involves the autoderef mechanism, but more importantly, there is an implicit ordering of the candidates that is key as plain old Impl block methods are always prefered over trait’s.

struct Foo {}

trait Bar where Self:Sized {
  fn bar(self) {
      println!("In trait def!")
  }
}

impl Foo {
  fn bar(self) {
    println!("In struct impl!")
  }
}

impl Bar for Foo {
  fn bar(self) {
    println!("In trait impl!")
  }
}

fn main() {
  let mut f = Foo{};
  f.bar();
}

This test case is all about showing even though there are two duplicate methods here, we enforce that the impl block method is prefered first.

see: https://rustc-dev-guide.rust-lang.org/method-lookup.html https://doc.rust-lang.org/nightly/nomicon/dot-operator.html

Add check for duplicate overlapping impl-items

Rust allows multiple impl blocks to give a generic data type specialization. But suppose the programmer adds a generic impl for a duplicate method. In that case, it will become impossible to distinguish the method ‘bar’ in each of these specialized impl blocks for method resolution. Since your receiver could be Foo<_ (inference variable)> which could resolve to Foo<isize> or Foo<char>. see: rustc –explain E0592

struct Foo<A>(A);

impl Foo<isize> {
    fn bar(self) -> isize {
        self.0
    }
}

impl Foo<char> {
    fn bar(self) -> char {
        self.0
    }
}

impl<T> Foo<T> {
    fn bar(self) -> T {
        self.0
    }
}

Support Dereference operator overloading

Deref operator overloading is a core piece of Rusts control flow mechanism, it adds in support for more complex method resolution cases as part of the autoderef mechanism. It also has served as a good test of the current state of the type system so far.

extern "C" {
    fn printf(s: *const i8, ...);
}

#[lang = "deref"]
pub trait Deref {
    type Target;

    fn deref(&self) -> &Self::Target;
}

impl<T> Deref for &T {
    type Target = T;

    fn deref(&self) -> &T {
        *self
    }
}

impl<T> Deref for &mut T {
    type Target = T;

    fn deref(&self) -> &T {
        *self
    }
}

struct Foo<T>(T);
impl<T> Deref for Foo<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

fn main() -> i32 {
    let foo: Foo<i32> = Foo(123);
    let bar: i32 = *foo;

    unsafe {
        let a = "%i\n\0";
        let b = a as *const str;
        let c = b as *const i8;

        printf(c, bar);
    }

    0
}

The interesting piece about dereferences is that the actual deref method that is implemented always returns a reference to the associated type ‘Target’, this implicitly requires the compiler call this method and because the trait and type checking ensures that the result is a reference it means it can safely be dereferenced by the compiler implicitly. I point this out because simply because the function prototype:

fn deref(&self) -> &Self::Target {
    &self.0
}

Here the function type is:

fn deref(self: &Foo<T>) -> &T { &self.0 }

So the dereference operation even on custom types is always going to return a reference. So the dereference operator overloading is a two step mechanism.

Overall Status

Contributors this year

Lines of Code (LOC)

average lines added per week: 1260.058
average lines deleted per week: 530.942

Language	files	blank	comment	code
C/C++ Header	128	13393	10071	52291
C++	40	5097	3695	24736
Rust	416	1195	820	5433
Markdown	2	50	0	119
C	1	38	46	113
Bourne Shell	1	16	12	101
Windows Module Definition	1	15	0	74
TOML	7	19	7	64
Expect	5	35	127	35
YAML	1	9	0	34
SUM:	602	19867	14778	83000

Overall Task Status

Category	Last Year	This Year	Delta
TODO	35	88	+53
In Progress	4	16	+12
Completed	6	257	+251

GitHub Issues

Test Cases

Category	Last Year	This Year	Delta
Passing	40	5411	+5371
XFAIL	4	21	–

make check-rust

Bugs

Category	Last Year	This Year	Delta
TODO	–	24	+24
In Progress	–	4	+4
Completed	–	90	+90

GitHub Bugs

Milestone Progress

Milestone	Last Year	This Year	Delta	Start Date	Completion Date	Target
Data Structures 1 – Core	61%	100%	+39%	30th Nov 2020	27th Jan 2021	29th Jan 2021
Control Flow 1 – Core	0%	100%	+100%	28th Jan 2021	10th Feb 2021	26th Feb 2021
Data Structures 2 – Generics	0%	100%	+100%	11th Feb 2021	14th May 2021	28th May 2021
Data Structures 3 – Traits	0%	100%	+100%	20th May 2021	17th Sept 2021	27th Aug 2021
Control Flow 2 – Pattern Matching	0%	%100	+100%	20th Sept 2021	9th Dec 2021	29th Nov 2021
Macros and cfg expansion	0%	0%	–	1st Dec 2021	–	28th Mar 2022
Imports and Visibility	0%	0%	–	29th Mar 2022	–	27th May 2022
Const Generics	0%	0%	–	30th May 2022	–	25th Jul 2022
Intrinsics	0%	0%	–	6th Sept 2021	–	30th Sept 2022

GitHub Milestones

Risks

Risk	Impact (1-3)	Likelihood (0-10)	Risk (I * L)	Mitigation
Rust Language Changes	3	7	21	Keep up to date with the Rust language on a regular basis
Going over target dates	3	5	15	Maintain status reports and issue tracking to stakeholders