GCC Rust Monthly Report #14 February 2022

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Milestone Progress

February was a big month for GCC Rust; our previous Google Summer of Code student Arthur Cohen joins us in Embecosm, Germany, working on the compiler full time. With the additional resource, we can split up work and delegate tasks allowing multiple streams of complex work to take place, which frees Philip up to work on Slice’s and bugs in the type system.

Concerning our macro expansion milestone, we have the building blocks in place and support most declarative macros though there are some quirks and bugs to work through still. The remaining time of this milestone will be spent on setting up the builtin macros and fixing bugs.

Other than macros, there has been a focus on code cleanup and bug fixing in February, which has been fruitful.

Monthly Community Call

It’s time for our next community call, feel free to join in! 🙂

Completed Activities

  • Bug Fix canonical-path of impl blocks nested under modules PR900
  • Bug Fix enum discriminant values to use constexpr code PR902
  • Apply cfg expansion to the rest of the crate PR904
  • Track canonical-path and location into as part of the type-check pass PR903
  • Apply .c to .cc rename as part of merge from upstream PR906
  • Support any,not,all predicates in cfg-expansion PR907
  • Enable GCC self-test framework PR751
  • Fix diagnostic formatting issues -Wformat-diag PR908
  • Improve location info on match-arms PR888
  • Support key value pairs being passed in -frust-cfg PR909
  • Refactor TypeKind ToString from header to implementation PR911
  • Bug Fix multiple generic substitution on path expressions PR912
  • Support the inline attribute PR916 PR922
  • Reuse C/C++ front-end mark_addressable code PR917
  • Cleanup code generation to remove duplication PR918
  • Support deref_mut lang item during method resolution PR920
  • Add Macro expansion tests PR926
  • BugFix CFG expansion values require quotes PR931 PR935
  • Add cargo-gccrs to our DockerFile PR937
  • Add missing location into to macros PR934 PR933 PR932
  • Add support for Macro Expansion PR938
  • Add missing location into to AST PR940
  • replace lambda with std::vector reference in AST::PathPattern PR942
  • Add clear_errors to make parser reuseable for macro expansion PR944
  • Support matching macro repetition rules PR950
  • Add name resolution for slice’s PR951
  • Repetition macros PR955 PR965 PR956
  • Fix unresolved test-case PR964
  • Refactor lang item mappings PR953
  • Add builtin macro framework PR969
  • Add Support for index and Range lang items and boiler plate for Slice typechecking PR974
  • Add file! builtin macro PR970

Contributors this month

Overall Task Status

CategoryLast MonthThis MonthDelta
TODO101118+17
In Progress1917-2
Completed273297+24
GitHub Issues

Test Cases

CategoryLast MonthThis MonthDelta
Passing56176068+451
Failed
XFAIL2121
XPASS
make check-rust

Bugs

CategoryLast MonthThis MonthDelta
TODO3440+6
In Progress55
Completed102109+7
GitHub Bugs

Milestones Progress

MilestoneLast MonthThis MonthDeltaStart DateCompletion DateTarget
Data Structures 1 – Core100%100%30th Nov 202027th Jan 202129th Jan 2021
Control Flow 1 – Core100%100%28th Jan 202110th Feb 202126th Feb 2021
Data Structures 2 – Generics100%100%11th Feb 202114th May 202128th May 2021
Data Structures 3 – Traits100%100%20th May 202117th Sept 202127th Aug 2021
Control Flow 2 – Pattern Matching100%%10020th Sept 20219th Dec 202129th Nov 2021
Macros and cfg expansion18%65%+47%1st Dec 202128th Mar 2022
Imports and Visibility0%0%29th Mar 202227th May 2022
Const Generics0%0%30th May 202225th Jul 2022
Intrinsics0%0%6th Sept 202130th Sept 2022
GitHub Milestones

Risks

RiskImpact (1-3)Likelihood (0-10)Risk (I * L)Mitigation
Rust Language Changes3721Keep up to date with the Rust language on a regular basis
Going over target dates3515Maintain status reports and issue tracking to stakeholders

Planned Activities

  • Continue work on builtin macros
  • Continue work into Slices
  • Support the mutable context during type checking for dereference or array index operations
  • Create more good-first-pr’s

Detailed changelog

Canonical-paths

We have improved our canonical-path tracking so that we can build up paths for the legacy mangling scheme. So for example impl blocks nested under modules are given a prefix of impl in their path.

struct Foo(i32);

mod A {
    impl Foo {
        fn test(&self) -> i32 {
            self.0
        }
    }
}

fn test() {
    let a = Foo(123);
    let b:i32 = a.test();
}

As you can see we have the crate-name of example -> structure A -> impl block for example::A -> function name test.

i32 example::A::<impl example::Foo>::test (const struct example::Foo & const self)
{
  i32 D.85;

  D.85 = self->0;
  return D.85;
}


void example::test ()
{
  const struct example::Foo a;
  const i32 b;

  try
    {
      a.0 = 123;
      b = example::A::<impl example::Foo>::test (&a);
    }
  finally
    {
      a = {CLOBBER};
    }
}

see: https://godbolt.org/z/P94an5f5W

cfg expansion with predicates

We added support for any, all and not predicates on cfg expansions so in this example this ensures that both A and B are specified for the all predicate.

struct Foo;
impl Foo {
    #[cfg(all(A, B))]
    fn test(&self) {}
}

fn main() {
    let a = Foo;
    a.test();
}

see: https://godbolt.org/z/sW9K19EqE

Key-value cfg-expansion

Rust allows us to specify key-value pairs for config expansion this is mostly associated with host/os/cpu options such as os = “linux” for example but here is an example below you can try in compiler explorer.

struct Foo;
impl Foo {
    #[cfg(A = "B")]
    fn test(&self) {}
}

fn main() {
    let a = Foo;
    a.test();
}

see: https://godbolt.org/z/7YT1jMMMz

inline attributes

In Rust the inline attribute takes three forms:

  • #[inline]
  • #[inline(always)]
  • #[inline(never)]

Inline without any option is analogous to C style inline keyword giving a hint to the compiler that this function is a good candidate for inlining. Inline always can be acheived with GCC’s inline always attribute: https://gcc.gnu.org/onlinedocs/gcc/Inline.html. Finally never we can mark functions as DECL_UNINLINEABLE. The one difference is that inline optimizations require optimizations to be enabled. So when compiling at -O0 no inlining will occur, any level greater than this, the inline pass will be enforced.

We have always added some simple error handling for bad inline options such as:

#[inline(A)]
fn test() {}
test.rs:2:3: error: unknown inline option
    2 | #[inline(A)]
      |   ^
#[inline(A,B)]
fn test() {}
test.rs:5:3: error: invalid number of arguments
    5 | #[inline(A, B)]
      |   ^

deref_mut lang item

Work on method resolution has continued steadily and we now support the deref_mut lang item so that for methods that require a &mut self reference we try to lookup any relevant deref_mut lang item to get the indirection required from the receiver.

extern "C" {
    fn printf(s: *const i8, ...);
}
                                           
#[lang = "deref"]
pub trait Deref {
    type Target;

    fn deref(&self) -> &Self::Target;
}

#[lang = "deref_mut"]
pub trait DerefMut: Deref {
    fn deref_mut(&mut self) -> &mut Self::Target;
}

impl<T> Deref for &T {
    type Target = T;

    fn deref(&self) -> &T {
        *self
    }
}

impl<T> Deref for &mut T {
    type Target = T;
    fn deref(&self) -> &T {
        *self
    }
}

pub struct Bar(i32);      
impl Bar {
    pub fn foobar(&mut self) -> i32 {
        self.0  
    }
}

pub struct Foo<T>(T);
impl<T> Deref for Foo<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl<T> DerefMut for Foo<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        unsafe {
            let a = "mut_deref\n\0";
            let b = a as *const str;
            let c = b as *const i8;

            printf(c);
        }

        &mut self.0
    }
}

pub fn main() -> i32 {
    let bar = Bar(123);
    let mut foo: Foo<Bar> = Foo(bar);
    let foobar = foo.foobar();

    foobar - 123
}

See https://godbolt.org/z/xcM9ohcjK

Declarative Macro Expansion

We have merged our first pass of the macro expansion pass. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering.

In this example the macro has two rules so we demonstrate that we match the apropriate rule and transcribe it respectively.

macro_rules! add {
    ($a:expr,$b:expr) => {
        $a + $b
    };
    ($a:expr) => {
        $a
    };
}

fn main() -> i32 {
    let mut x = add!(1);
    x += add!(2, 3);

    x - 6
}

Another exmaple:

macro_rules! Test {
    ($a:ident, $b:ty) => {
        struct $a($b);
    };
}

Test!(Foo, i32);

fn main() -> i32 {
    let a = Foo(123);
    a.0 - 123
}

Here we take into account the context of the macro invocation and parse it into AST::Items. In the even of failure to match a rule the compiler error looks like the following:

<source>:11:17: error: Failed to match any rule within macro
    1 | macro_rules! add {
      | ~                
......
   11 |     let mut x = add!(1, 2, 3);
      |                 ^

More error handling has been added for when the transcribed rule actually is not fully used so for example:

<source>:4:9: error: tokens here and after are unparsed
    4 |         struct BAD($b);
      |         ^

see: https://godbolt.org/z/TK3qdG56n

Range Lang items

In rust ranges are turned into structs so what seems like piece of syntax to specify some kind of constraint is actually something which can be assigned and manipulated. This is one of the building blocks in our journey to support slices.

#[lang = "RangeFull"]
pub struct RangeFull;

#[lang = "Range"]
pub struct Range<Idx> {
    pub start: Idx,
    pub end: Idx,
}

#[lang = "RangeFrom"]
pub struct RangeFrom<Idx> {
    pub start: Idx,
}

#[lang = "RangeTo"]
pub struct RangeTo<Idx> {
    pub end: Idx,
}

#[lang = "RangeInclusive"]
pub struct RangeInclusive<Idx> {
    pub start: Idx,
    pub end: Idx,
}

fn test() {
    let a = 1..2; // range
    let b = 1..; // range from
    let c = ..3; // range to
    let d = 0..=2; // range inclusive
}

See: https://doc.rust-lang.org/std/ops/struct.Range.html

Index Lang items

Another building block to support slices is the ability to suport the index lang item core::ops::index so that a range can be an argument and the code in core::slice::index can actually become the starting point in giving us a slice from an array.

#[lang = "index"]
trait Index<Idx> {
    type Output;

    fn index(&self, index: Idx) -> &Self::Output;
}

struct Foo(i32, i32);
impl Index<isize> for Foo {
    type Output = i32;

    fn index(&self, index: isize) -> &i32 {
        if index == 0 {
            &self.0
        } else {
            &self.1
        }
    }
}

fn main() -> i32 {
    let a = Foo(1, 2);
    let b = a[0];
    let c = a[1];

    c - b - 1
}

See: https://doc.rust-lang.org/core/ops/trait.Index.html

Repetition Macros

Matching macro repetitions

Macro match arms can contain repetition operators, which indicate the possibilty of passing multiple instances of a single token or metavariable.

You can denote such repetitions using Kleene operators: Three variants are available, ?, + and *. Each corresponds to varying bounds on the amount of tokens associated with the operator, similarly to regular expressions.

macro_rules! kleene {
    ($a:ident $(,)?) => {{ }};
    ($($i:literal tok)+) => {{ }};
    ($($e:expr)*) => {{ }};
}

The declaration above contains three possible matching invocations:

  1. Either a singular identifier, with zero or one commas (pattern: <comma>, kleene operator: ? (0 -> 1))
  2. One or more literal followed by the separator tok (pattern $i:literal tok, kleene operator: + (1 -> +inf))
  3. Zero or more expressions tok (pattern $e:expr, kleene operator: * (0 -> +inf))

The first of implementing macro repetitions comes in matching the actual patterns given to the users. We are now able to match simple repetitions, with a few limitations and bugs still. For example, the Rust reference specifies valid separators to use after fragment specifiers, which we do not check yet. It is for example forbidden to add an identifier after an $<>:expr specifier, since that could cause ambiguity: The only allowed separators after an expression are thus =>, <comma> or ;.

See: https://doc.rust-lang.org/reference/macros-by-example.html#follow-set-ambiguity-restrictions

Once those repetition patterns are matched, it is easy to figure out how many repetitions of said pattern were given by the user. We store this data alongside the rest of the fragment, to make sure that we expand said pattern a correct amount of times when transcribing.

Given the following match arm:

macro_rules! lit_plus_tok {
    ($($e:literal tok)*) => {}
}

And the following invocation:

lit_plus_tok!("rustc" tok 'v' tok 1.59 tok);

we will have matched the repetition 3 times, and attributed a repetition amount of 3 to the $e meta-variable.

See: https://doc.rust-lang.org/rust-by-example/macros/repeat.html and https://doc.rust-lang.org/reference/macros-by-example.html#repetitions

Expanding macro repetitions

Following the matching of these repetitions, we can recursively expand all tokens contained in the pattern.

Considering once again the previous declaration and invocation, we can parse the following pattern as the one to expand:

($e:literal tok)

This pattern is then recursively expanded as if it was a regular macro invocation. In order to make sure that each meta-variable gets expanded correctly, we only give a subset of the matched fragments to the new subsitution context.

macro_rules! lit_plus_tok {
    ($($e:literal tok)*) => {}
}

lit_plus_tok!("rustc" tok 'v' tok 1.59 tok);

// Original matched fragments: { "lit": ["rustc", 'v', 1.59] }
// We then expand the repetition pattern once with { "lit": ["rustc"] },
// once with { "lit": ['v'] },
// and finally once with { "lit": [1.59] },

Once again, certain restrictions apply, which we have yet to implement: Some specifiers get expanded eagerly, while some stay under the form inputted by the user.

See: https://doc.rust-lang.org/reference/macros-by-example.html#transcribing

Likewise, not all repetition patterns are covered properly. Some issues remain to be ironed out for a complete and correct implementation.

Builtin macros

In order to implement some specific behaviour, the rust standard library requires some macros to be built into the compiler. You can find a full list here.

gccrs should implement to allow for the compilation of the standard rust library, as both core and std depend on a multitude of them.

These macros are defined as empty within the core library, and their transcriber is provided in the compiler as a simple function. We implement those builtins in gccrs as functions returning fragments of abstract syntax trees, which are inserted during the macro-expansion phase and then lowered to an intermediate representation alongside the rest of the user’s code.

We have a long list of macros ahead of us, some of which we should be able to implement easily. If you are interested in contributing, we have opened 3 good first issues regarding builtin macros with detailed guides on how to solve them.

Thanks a lot to bjorn3 for all the help regarding builtin macros and their implementation details.

Leave a Reply

Your email address will not be published.