GCC Rust Monthly Report #14 February 2022

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Milestone Progress

February was a big month for GCC Rust; our previous Google Summer of Code student Arthur Cohen joins us in Embecosm, Germany, working on the compiler full time. With the additional resource, we can split up work and delegate tasks allowing multiple streams of complex work to take place, which frees Philip up to work on Slice’s and bugs in the type system.

Concerning our macro expansion milestone, we have the building blocks in place and support most declarative macros though there are some quirks and bugs to work through still. The remaining time of this milestone will be spent on setting up the builtin macros and fixing bugs.

Other than macros, there has been a focus on code cleanup and bug fixing in February, which has been fruitful.

Monthly Community Call

It’s time for our next community call, feel free to join in! 🙂

Date and Time 4th March 2022 at: 14h00 UTC
Agenda: https://hackmd.io/KkrXwApIQgWwaUL8fl1Yrg please feel free to add agenda items you wish to see discussed.
Jitsi: https://meet.jit.si/ArtificialPantsFlashNeither

Completed Activities

Bug Fix canonical-path of impl blocks nested under modules PR900
Bug Fix enum discriminant values to use constexpr code PR902
Apply cfg expansion to the rest of the crate PR904
Track canonical-path and location into as part of the type-check pass PR903
Apply .c to .cc rename as part of merge from upstream PR906
Support any,not,all predicates in cfg-expansion PR907
Enable GCC self-test framework PR751
Fix diagnostic formatting issues -Wformat-diag PR908
Improve location info on match-arms PR888
Support key value pairs being passed in -frust-cfg PR909
Refactor TypeKind ToString from header to implementation PR911
Bug Fix multiple generic substitution on path expressions PR912
Support the inline attribute PR916 PR922
Reuse C/C++ front-end mark_addressable code PR917
Cleanup code generation to remove duplication PR918
Support deref_mut lang item during method resolution PR920
Add Macro expansion tests PR926
BugFix CFG expansion values require quotes PR931 PR935
Add cargo-gccrs to our DockerFile PR937
Add missing location into to macros PR934 PR933 PR932
Add support for Macro Expansion PR938
Add missing location into to AST PR940
replace lambda with std::vector reference in AST::PathPattern PR942
Add clear_errors to make parser reuseable for macro expansion PR944
Support matching macro repetition rules PR950
Add name resolution for slice’s PR951
Repetition macros PR955 PR965 PR956
Fix unresolved test-case PR964
Refactor lang item mappings PR953
Add builtin macro framework PR969
Add Support for index and Range lang items and boiler plate for Slice typechecking PR974
Add file! builtin macro PR970

Contributors this month

Overall Task Status

Category	Last Month	This Month	Delta
TODO	101	118	+17
In Progress	19	17	-2
Completed	273	297	+24

GitHub Issues

Test Cases

Category	Last Month	This Month	Delta
Passing	5617	6068	+451
Failed	–	–	–
XFAIL	21	21	–
XPASS	–	–	–

make check-rust

Bugs

Category	Last Month	This Month	Delta
TODO	34	40	+6
In Progress	5	5	–
Completed	102	109	+7

GitHub Bugs

Milestones Progress

Milestone	Last Month	This Month	Delta	Start Date	Completion Date	Target
Data Structures 1 – Core	100%	100%	–	30th Nov 2020	27th Jan 2021	29th Jan 2021
Control Flow 1 – Core	100%	100%	–	28th Jan 2021	10th Feb 2021	26th Feb 2021
Data Structures 2 – Generics	100%	100%	–	11th Feb 2021	14th May 2021	28th May 2021
Data Structures 3 – Traits	100%	100%	–	20th May 2021	17th Sept 2021	27th Aug 2021
Control Flow 2 – Pattern Matching	100%	%100	–	20th Sept 2021	9th Dec 2021	29th Nov 2021
Macros and cfg expansion	18%	65%	+47%	1st Dec 2021	–	28th Mar 2022
Imports and Visibility	0%	0%	–	29th Mar 2022	–	27th May 2022
Const Generics	0%	0%	–	30th May 2022	–	25th Jul 2022
Intrinsics	0%	0%	–	6th Sept 2021	–	30th Sept 2022

GitHub Milestones

Risks

Risk	Impact (1-3)	Likelihood (0-10)	Risk (I * L)	Mitigation
Rust Language Changes	3	7	21	Keep up to date with the Rust language on a regular basis
Going over target dates	3	5	15	Maintain status reports and issue tracking to stakeholders

Planned Activities

Continue work on builtin macros
Continue work into Slices
Support the mutable context during type checking for dereference or array index operations
Create more good-first-pr’s

Detailed changelog

Canonical-paths

We have improved our canonical-path tracking so that we can build up paths for the legacy mangling scheme. So for example impl blocks nested under modules are given a prefix of impl in their path.

struct Foo(i32);

mod A {
    impl Foo {
        fn test(&self) -> i32 {
            self.0
        }
    }
}

fn test() {
    let a = Foo(123);
    let b:i32 = a.test();
}

As you can see we have the crate-name of example -> structure A -> impl block for example::A -> function name test.

i32 example::A::<impl example::Foo>::test (const struct example::Foo & const self)
{
  i32 D.85;

  D.85 = self->0;
  return D.85;
}


void example::test ()
{
  const struct example::Foo a;
  const i32 b;

  try
    {
      a.0 = 123;
      b = example::A::<impl example::Foo>::test (&a);
    }
  finally
    {
      a = {CLOBBER};
    }
}

see: https://godbolt.org/z/P94an5f5W

cfg expansion with predicates

We added support for any, all and not predicates on cfg expansions so in this example this ensures that both A and B are specified for the all predicate.

struct Foo;
impl Foo {
    #[cfg(all(A, B))]
    fn test(&self) {}
}

fn main() {
    let a = Foo;
    a.test();
}

see: https://godbolt.org/z/sW9K19EqE

Key-value cfg-expansion

Rust allows us to specify key-value pairs for config expansion this is mostly associated with host/os/cpu options such as os = “linux” for example but here is an example below you can try in compiler explorer.

struct Foo;
impl Foo {
    #[cfg(A = "B")]
    fn test(&self) {}
}

fn main() {
    let a = Foo;
    a.test();
}

see: https://godbolt.org/z/7YT1jMMMz

inline attributes

In Rust the inline attribute takes three forms:

#[inline]
#[inline(always)]
#[inline(never)]

Inline without any option is analogous to C style inline keyword giving a hint to the compiler that this function is a good candidate for inlining. Inline always can be acheived with GCC’s inline always attribute: https://gcc.gnu.org/onlinedocs/gcc/Inline.html. Finally never we can mark functions as DECL_UNINLINEABLE. The one difference is that inline optimizations require optimizations to be enabled. So when compiling at -O0 no inlining will occur, any level greater than this, the inline pass will be enforced.

We have always added some simple error handling for bad inline options such as:

#[inline(A)]
fn test() {}

test.rs:2:3: error: unknown inline option
    2 | #[inline(A)]
      |   ^

#[inline(A,B)]
fn test() {}

test.rs:5:3: error: invalid number of arguments
    5 | #[inline(A, B)]
      |   ^

deref_mut lang item

Work on method resolution has continued steadily and we now support the deref_mut lang item so that for methods that require a &mut self reference we try to lookup any relevant deref_mut lang item to get the indirection required from the receiver.

extern "C" {
    fn printf(s: *const i8, ...);
}
                                           
#[lang = "deref"]
pub trait Deref {
    type Target;

    fn deref(&self) -> &Self::Target;
}

#[lang = "deref_mut"]
pub trait DerefMut: Deref {
    fn deref_mut(&mut self) -> &mut Self::Target;
}

impl<T> Deref for &T {
    type Target = T;

    fn deref(&self) -> &T {
        *self
    }
}

impl<T> Deref for &mut T {
    type Target = T;
    fn deref(&self) -> &T {
        *self
    }
}

pub struct Bar(i32);      
impl Bar {
    pub fn foobar(&mut self) -> i32 {
        self.0  
    }
}

pub struct Foo<T>(T);
impl<T> Deref for Foo<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl<T> DerefMut for Foo<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        unsafe {
            let a = "mut_deref\n\0";
            let b = a as *const str;
            let c = b as *const i8;

            printf(c);
        }

        &mut self.0
    }
}

pub fn main() -> i32 {
    let bar = Bar(123);
    let mut foo: Foo<Bar> = Foo(bar);
    let foobar = foo.foobar();

    foobar - 123
}

See https://godbolt.org/z/xcM9ohcjK

Declarative Macro Expansion

We have merged our first pass of the macro expansion pass. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering.

In this example the macro has two rules so we demonstrate that we match the apropriate rule and transcribe it respectively.

macro_rules! add {
    ($a:expr,$b:expr) => {
        $a + $b
    };
    ($a:expr) => {
        $a
    };
}

fn main() -> i32 {
    let mut x = add!(1);
    x += add!(2, 3);

    x - 6
}

Another exmaple:

macro_rules! Test {
    ($a:ident, $b:ty) => {
        struct $a($b);
    };
}

Test!(Foo, i32);

fn main() -> i32 {
    let a = Foo(123);
    a.0 - 123
}

Here we take into account the context of the macro invocation and parse it into AST::Items. In the even of failure to match a rule the compiler error looks like the following:

<source>:11:17: error: Failed to match any rule within macro
    1 | macro_rules! add {
      | ~                
......
   11 |     let mut x = add!(1, 2, 3);
      |                 ^

More error handling has been added for when the transcribed rule actually is not fully used so for example:

<source>:4:9: error: tokens here and after are unparsed
    4 |         struct BAD($b);
      |         ^

see: https://godbolt.org/z/TK3qdG56n

Range Lang items

In rust ranges are turned into structs so what seems like piece of syntax to specify some kind of constraint is actually something which can be assigned and manipulated. This is one of the building blocks in our journey to support slices.

#[lang = "RangeFull"]
pub struct RangeFull;

#[lang = "Range"]
pub struct Range<Idx> {
    pub start: Idx,
    pub end: Idx,
}

#[lang = "RangeFrom"]
pub struct RangeFrom<Idx> {
    pub start: Idx,
}

#[lang = "RangeTo"]
pub struct RangeTo<Idx> {
    pub end: Idx,
}

#[lang = "RangeInclusive"]
pub struct RangeInclusive<Idx> {
    pub start: Idx,
    pub end: Idx,
}

fn test() {
    let a = 1..2; // range
    let b = 1..; // range from
    let c = ..3; // range to
    let d = 0..=2; // range inclusive
}

See: https://doc.rust-lang.org/std/ops/struct.Range.html

Index Lang items

Another building block to support slices is the ability to suport the index lang item core::ops::index so that a range can be an argument and the code in core::slice::index can actually become the starting point in giving us a slice from an array.

#[lang = "index"]
trait Index<Idx> {
    type Output;

    fn index(&self, index: Idx) -> &Self::Output;
}

struct Foo(i32, i32);
impl Index<isize> for Foo {
    type Output = i32;

    fn index(&self, index: isize) -> &i32 {
        if index == 0 {
            &self.0
        } else {
            &self.1
        }
    }
}

fn main() -> i32 {
    let a = Foo(1, 2);
    let b = a[0];
    let c = a[1];

    c - b - 1
}

See: https://doc.rust-lang.org/core/ops/trait.Index.html

Repetition Macros

Matching macro repetitions

Macro match arms can contain repetition operators, which indicate the possibilty of passing multiple instances of a single token or metavariable.

You can denote such repetitions using Kleene operators: Three variants are available, ?, + and *. Each corresponds to varying bounds on the amount of tokens associated with the operator, similarly to regular expressions.

macro_rules! kleene {
    ($a:ident $(,)?) => {{ }};
    ($($i:literal tok)+) => {{ }};
    ($($e:expr)*) => {{ }};
}

The declaration above contains three possible matching invocations:

Either a singular identifier, with zero or one commas (pattern: <comma>, kleene operator: ? (0 -> 1))
One or more literal followed by the separator tok (pattern $i:literal tok, kleene operator: + (1 -> +inf))
Zero or more expressions tok (pattern $e:expr, kleene operator: * (0 -> +inf))

The first of implementing macro repetitions comes in matching the actual patterns given to the users. We are now able to match simple repetitions, with a few limitations and bugs still. For example, the Rust reference specifies valid separators to use after fragment specifiers, which we do not check yet. It is for example forbidden to add an identifier after an $<>:expr specifier, since that could cause ambiguity: The only allowed separators after an expression are thus =>, <comma> or ;.

See: https://doc.rust-lang.org/reference/macros-by-example.html#follow-set-ambiguity-restrictions

Once those repetition patterns are matched, it is easy to figure out how many repetitions of said pattern were given by the user. We store this data alongside the rest of the fragment, to make sure that we expand said pattern a correct amount of times when transcribing.

Given the following match arm:

macro_rules! lit_plus_tok {
    ($($e:literal tok)*) => {}
}

And the following invocation:

lit_plus_tok!("rustc" tok 'v' tok 1.59 tok);

we will have matched the repetition 3 times, and attributed a repetition amount of 3 to the $e meta-variable.

See: https://doc.rust-lang.org/rust-by-example/macros/repeat.html and https://doc.rust-lang.org/reference/macros-by-example.html#repetitions

Expanding macro repetitions

Following the matching of these repetitions, we can recursively expand all tokens contained in the pattern.

Considering once again the previous declaration and invocation, we can parse the following pattern as the one to expand:

($e:literal tok)

This pattern is then recursively expanded as if it was a regular macro invocation. In order to make sure that each meta-variable gets expanded correctly, we only give a subset of the matched fragments to the new subsitution context.

macro_rules! lit_plus_tok {
    ($($e:literal tok)*) => {}
}

lit_plus_tok!("rustc" tok 'v' tok 1.59 tok);

// Original matched fragments: { "lit": ["rustc", 'v', 1.59] }
// We then expand the repetition pattern once with { "lit": ["rustc"] },
// once with { "lit": ['v'] },
// and finally once with { "lit": [1.59] },

Once again, certain restrictions apply, which we have yet to implement: Some specifiers get expanded eagerly, while some stay under the form inputted by the user.

See: https://doc.rust-lang.org/reference/macros-by-example.html#transcribing

Likewise, not all repetition patterns are covered properly. Some issues remain to be ironed out for a complete and correct implementation.

Builtin macros

In order to implement some specific behaviour, the rust standard library requires some macros to be built into the compiler. You can find a full list here.

gccrs should implement to allow for the compilation of the standard rust library, as both core and std depend on a multitude of them.

These macros are defined as empty within the core library, and their transcriber is provided in the compiler as a simple function. We implement those builtins in gccrs as functions returning fragments of abstract syntax trees, which are inserted during the macro-expansion phase and then lowered to an intermediate representation alongside the rest of the user’s code.

We have a long list of macros ahead of us, some of which we should be able to implement easily. If you are interested in contributing, we have opened 3 good first issues regarding builtin macros with detailed guides on how to solve them.

Thanks a lot to bjorn3 for all the help regarding builtin macros and their implementation details.