GCC Rust Weekly Status Report 44

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Milestone Progress

We are approaching the end of macro expansion, we have one outstanding issue on how we handle expansion around types. Apart from that, we have some small, non-blocking macro issues marked with the good-first-pr label for new contributors to make their mark. Feel free to ask about them on our Zulip or GitHub!

Macros have gotten more correct as more restrictions have been implemented, such as the verification of follow-set ambiguities. Rust 1.49.0 slices will soon be entirely supported, with only some cleanup of the various implementations remaining to be performed. We will start drafting some issues for the next milestone, which concerns Imports and Visibility. This includes source-code visibility (pub, pub(crate), …) as well as metadata exports and proper handling of .rlib and .rmeta files.

Finally, we have also started looking into improving our set of CI checks: Thanks to a new contributor Daniel del Castillo, we can now easily check that our frontend does not introduce new warnings, which would contradict with gcc’s expectations of a full boostrapping build. We are also working on getting gccrs compilable using an older gcc version (4.8), which is a requirement for upstreaming as well as backporting.

Monthly Community Call

We will be having our regular community call as the first Friday of the month:

Completed Activities

  • Enable -Werror in CI PR1026
  • Do not propagate parser errors in match_repetitions PR1040
  • Only expand merged repetitions if they contain the same amount PR1041
  • Implement include_bytes! and include_str! PR1043
  • Restrict follow up tokens on :expr and :stmt PR1044
  • Add helper function for subsituted tokens debugging PR1047
  • Add better restrictions around semicolons in statements parsing PR1049
  • Add remaining restrictions for follow-set restrictions PR1051
  • Add hints for valid follow-set tokens PR1052
  • Fix overzealous follow-set ambiguity PR1054
  • Allow checking past zeroable matches for follow-set restrictions PR1055
  • Fix #include <algorithm> PR1056
  • Provide std::hash for Rust::AST::MacroFragSpec::Kind enum class PR1057
  • Properly perform follow-set checking on matchers PR1062
  • Handle :tt fragments properly PR1064
  • Handle :meta fragments properly PR1063

Contributors this week

Overall Task Status

CategoryLast WeekThis WeekDelta
TODO110105-5
In Progress1823+5
Completed327336+9
GitHub Issues

Test Cases

CategoryLast WeekThis WeekDelta
Passing55115677+166
Failed
XFAIL2222
XPASS
make check-rust

Bugs

CategoryLast WeekThis WeekDelta
TODO3435+1
In Progress810+2
Completed125129+4
GitHub Bugs

Milestones Progress

MilestoneLast WeekThis WeekDeltaStart DateCompletion DateTarget
Data Structures 1 – Core100%100%30th Nov 202027th Jan 202129th Jan 2021
Control Flow 1 – Core100%100%28th Jan 202110th Feb 202126th Feb 2021
Data Structures 2 – Generics100%100%11th Feb 202114th May 202128th May 2021
Data Structures 3 – Traits100%100%20th May 202117th Sept 202127th Aug 2021
Control Flow 2 – Pattern Matching100%100%20th Sept 20219th Dec 202129th Nov 2021
Macros and cfg expansion87%99%+12%1st Dec 202128th Mar 2022
Imports and Visibility0%0%29th Mar 202227th May 2022
Const Generics0%0%30th May 202225th Jul 2022
Intrinsics0%0%6th Sept 202130th Sept 2022
GitHub Milestones

Risks

RiskImpact (1-3)Likelihood (0-10)Risk (I * L)Mitigation
Rust Language Changes3721Keep up to date with the Rust language on a regular basis
Going over target dates2510Maintain status reports and issue tracking to stakeholders

Planned Activities

  • Finish working out the various quirks of macros
  • Make sure follow-set ambiguities are implemented properly
  • Merge unsized method resolution
  • Handle macro opacity properly
  • Plan out next milestone

Detailed changelog

Two new macro builtins have been added to the compiler thanks to David Faust: include_bytes! and include_str!. They allow the user to include files at compilation time, either as bytes or valid UTF-8 strings. This can be extremely useful for anyone dealing with binary blobs, and adds even more code for new contributors to reuse when adding more builtin macros.

Their definition is as follows:

macro_rules! include_str {
    ($file:expr $(,)?) => {{ /* compiler built-in */ }};
}
macro_rules! include_bytes {
    ($file:expr $(,)?) => {{ /* compiler built-in */ }};
}

Follow-set ambiguities

While rust macros are extremely powerful, they are also heavily restricted to prevent ambiguities. These restrictions include sets of allowed fragments that can follow a certain metavariable fragment, which are referred to as follow-sets.

As an example, the follow set of :expr fragments is { COMMA, SEMICOLON, MATCH_ARROW }. Any other token cannot follow an :expr fragment, as it might cause ambiguities in later versions of the language.

This was previously not handled by gccrs at all. As a result, we had some test cases that contained ambiguous macro definitions that rustc rejected.

We dedicated some time this week to implement (almost!) all of these restrictions, including some complex cases involving repetitions:

Looking past zeroable repetitions

macro_rules! invalid {
  ($e:expr $(,)? $(;)* $(=>)* forbidden) => {{}};
  //  1      2     3     4        5         (matches)
}

Since matches 2, 3 and 4 might occur zero times (kleene operators * or ?), we need to check that the forbidden token is allowed to follow an :expr fragment, which is not the case since identifier tokens are not contained in its follow-set.

On the other hand, this macro is perfectly valid since a comma, contained in the follow-set of :expr, is required to appear at least once before any forbidden tokens

macro_rules! invalid {
  ($e:expr $(;)* $(,)+ $(=>)* forbidden) => {{}};
  // `+` kleen operator indicates one or more, meaning that there will always be at least one comma
}

Metavar fragments following other metavar fragments

macro_rules! mac {
  ($t:ty $lit:literal) => {{}}; // invalid
  ($t:ty $lit:block) => {{}}; // valid
}

The follow-set of :ty fragments allows the user to specify another fragment as follow-up, but only if this metavar fragment is a :block one.

An interesting tidbit is that these checks are performed at the beginning of the expansion phase in rustc, while we go through them during parsing. This is not set in stone, and we’d love to perform them later if required.

The remaining issues are marked as good-first-pr as they are simple and offer an entrypoint into the compiler’s implementation of macros.

Restrict merged repetitions to metavars with the same amount of repetitions

Likewise, you cannot merge together repetitions which do not have the same amount of repetitions:

macro_rules! tuplomatron {
  ($($e:expr),* ; $($f:expr),*) => { ( $( ( $e, $f ) ),* ) };
}

let tuple = tuplomatron!(1, 2, 3; 4, 5, 6); // valid
let tuple = tuplomatron!(1, 2, 3; 4, 5); // invalid since both metavars do not have the same amount of repetitions

This gets expanded properly into one big tuple:

let tuple = TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 1
 4
 TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 2
 5
 TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 3
 6
final expression: none

Handle :tt fragments properly

Having :tt fragments handled properly allows us to dwelve into the world of tt-munchers, a very powerful pattern which allows the implementation of extremely complex behaviors or DSLs. The target code we’re using for this comes directly from The Little Book of Rust Macros by Lukas Wirth, adapted to fit our non-println-aware compiler.

extern "C" {
    fn printf(fmt: *const i8, ...);
}

fn print(name: &str, value: i32) {
    unsafe {
        printf(
            "%s = %d\n\0" as *const str as *const i8,
            name as *const str as *const i8,
            value,
        );
    }
}

macro_rules! mixed_rules {
    () => {{}};
    (trace $name_str:literal $name:ident; $($tail:tt)*) => {
        {
            print($name_str, $name);
            mixed_rules!($($tail)*);
        }
    };
    (trace $name_str:literal $name:ident = $init:expr; $($tail:tt)*) => {
        {
            let $name = $init;
            print($name_str, $name);
            mixed_rules!($($tail)*);
        }
    };
}

fn main() {
    mixed_rules! (trace "a\0" a = 14; trace "a\0" a; trace "b\0" b = 15;);
}

This is now handled by gccrs, and produces the same output as rustc.

~/G/gccrs > rustc tt-muncher.rs
~/G/gccrs > ./tt-muncher
a = 14
a = 14
b = 15
~/G/gccrs > gccrs tt-muncher.rs -o tt-muncher-gccrs
~/G/gccrs > ./tt-muncher-gccrs
a = 14
a = 14
b = 15

Leave a Reply

Your email address will not be published. Required fields are marked *