GCC Rust Monthly Report #15 March 2022

Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.

Milestone Progress

March was a big month for the project; with Arthur joining Embecosm, we have been able to split up milestone work together. With his expertise, he was able to take over the development of macros and allow Philbert to concentrate on working on the type-system. As a result, we have closed out the macros milestone, the remaining work here is completing built-in macros, but we see this as part of our ongoing builtin’s/intrinsics milestone, which is ongoing anyway. Many of these builtins are pretty simple and are gateways for new developers to join the project as they see fit.

Note that there is a drop in the passing tests for our test suite. We have not removed any tests, but in dejagnu, we had many bad/duplicate unused code warnings; these are now fixed by reusing GCC infrastructure for unused variable detection code. It also removed our old AST unused code scan and improved our existing dead-code scan pass.

We have added a new pass on the automation front to check that our front-end continues to compile with the minimum GCC version so that we don’t break the bootstrap chain as we are using C++11. Thanks to the community and Thomas Schwinge for his work here.

Moving onto our next milestone of imports and visibility, we see this as breaking down into two streams of work:

  • Metadata exports and use statements
  • privacy checks like rustc_privacy

We are still working through how we will perform metadata exports and have been investigating:

  • CPP modules
  • LTO streaming
  • rolling our own

This will be a great opportunity to clean up and refactor how we perform name resolution to incorporate use statements.

Monthly Community Call

We had our regular community call on 1st April 2022, please find our meeting notes over on: https://github.com/Rust-GCC/Reporting/blob/main/2022-04-01-community-call.md

Testing Project

Recently we have created a new project under the Rust-GCC organisation for automated testing of gccrs in the wild. https://github.com/Rust-GCC/testing. The goal here is to allow for automated testing for more complex test-cases that don’t have to be part of the automated dejagnu compiler testsuite.

Currently we have automated testing for

  • Test gccrs with -fsyntax-only on rustc testsuite
  • Test rustc against gccrs dejagnu testsuite
    • most failures are because we have not implemented the main-shim

We are aiming to do more here with the aspiration to:

  • Test gccrs against rustc fully
  • Test gccrs against projects like Rust-GCC/gccrs#682
  • Benchmarking research
  • code-generation comparison and research

Leveraging automation allows us to track changes monthly without impacting development of the compiler directly allowing those who are interested to recreate the results locally.

Completed Activities

  • Refactor substitution context during macro expansion to be in its own file PR981
  • Enforce quotes during command line cfg arguments PR983
  • Bugfix memory corruption of lexing string buffers PR988
  • Remove bad lambda from iteration of arguments on function types PR984
  • Add must_use attribute support PR990
  • Bug fix parsing macro invocation with semicolon’s during statement contexts PR985
  • Fix ICE during recursive macro invocations PR986
  • Support repetition separators in macro rules PR991
  • Refactor HIR visitor and split it up in stmt, vis-item, pattern, external-item, impl, type and expression visitors PR954
  • Fix bad unused code warnings PR992
  • Macros can allow any delimiters for the innovcation PR997
  • Fix bugs in parsing macro repetitions PR994
  • Refactor ABI options into an enum during HIR lowering PR999
  • Handle macro invocations as statements vs expressions PR998
  • Cleanup how multiple matches are handled PR1002
  • Refactor how builtins/intrinsics are handled and add unreachable, abort, size_of and offset PR1003
  • Bug fix ICE on impl blocks for arrays or slices PR1007
  • Add missing generic substitution for covariants types slices and arrays PR1009
  • Add const_ptr lang item mappings PR1008
  • Implement HIR lowering for AST::SliceType PR1016
  • Refactor attribute visitor into its own file PR1017
  • Add more documentation for builtin macros PR1018
  • Generate GCC code for the libcore FatPtr/SliceType PR1015
  • Implement the builtin column! macro PR1004
  • Support placeholders becoming slices PR1037
  • Handle -fsyntax-only PR1035
  • Fix bad copy-past in can equal interface for pointer types PR1033
  • Add AST kind information PR1032
  • Rewrite our unconstrained type-param error checking PR1030
  • Macro in trait impl block PR1029
  • Allow parsing statements without closing semicolon PR1027
  • Fix memory corruption in generation of builtin functions PR1025
  • Fix spurious stripping of tail expression PR1022
  • Do not try and re-expand macros if depth has exceeded recursion limit PR1021
  • Enable -Werror in CI PR1026
  • Do not propagate parser errors in match_repetitions PR1040
  • Only expand merged repetitions if they contain the same amount PR1041
  • Implement include_bytes! and include_str! PR1043
  • Restrict follow up tokens on :expr and :stmt PR1044
  • Add helper function for subsituted tokens debugging PR1047
  • Add better restrictions around semicolons in statements parsing PR1049
  • Add remaining restrictions for follow-set restrictions PR1051
  • Add hints for valid follow-set tokens PR1052
  • Fix overzealous follow-set ambiguity PR1054
  • Allow checking past zeroable matches for follow-set restrictions PR1055
  • Fix #include <algorithm> PR1056
  • Provide std::hash for Rust::AST::MacroFragSpec::Kind enum class PR1057
  • Properly perform follow-set checking on matchers PR1062
  • Handle :tt fragments properly PR1064
  • Handle :meta fragments properly PR1063

Contributors this month

Overall Task Status

CategoryLast MonthThis MonthDelta
TODO118114-4
In Progress1723+6
Completed297338+41
GitHub Issues

Test Cases

CategoryLast MonthThis MonthDelta
Passing60685701-367
Failed
XFAIL2122+1
XPASS
make check-rust

Bugs

CategoryLast MonthThis MonthDelta
TODO4039-1
In Progress510+5
Completed109130+21
GitHub Bugs

Milestones Progress

MilestoneLast MonthThis MonthDeltaStart DateCompletion DateTarget
Data Structures 1 – Core100%100%30th Nov 202027th Jan 202129th Jan 2021
Control Flow 1 – Core100%100%28th Jan 202110th Feb 202126th Feb 2021
Data Structures 2 – Generics100%100%11th Feb 202114th May 202128th May 2021
Data Structures 3 – Traits100%100%20th May 202117th Sept 202127th Aug 2021
Control Flow 2 – Pattern Matching100%%10020th Sept 20219th Dec 202129th Nov 2021
Macros and cfg expansion65%100%+35%1st Dec 202128th Mar 2022
Imports and Visibility0%0%29th Mar 202227th May 2022
Const Generics0%0%30th May 202225th Jul 2022
Intrinsics0%0%6th Sept 202130th Sept 2022
GitHub Milestones

Risks

RiskImpact (1-3)Likelihood (0-10)Risk (I * L)Mitigation
Rust Language Changes3721Keep up to date with the Rust language on a regular basis
Going over target dates3515Maintain status reports and issue tracking to stakeholders

Rustc testsuite with -fsyntax-only

CategoryLast MonthThis MonthDelta
Passing10618
Failed2436
https://github.com/Rust-GCC/testing

Planned Activities

  • Continue research into rustc metadata exports
  • fix bugs with generic associated types
  • begin work on privacy pass akin to rustc_privacy

Detailed changelog

must use attribute

To support must use, the GCC CPP front-end already supports the C++ nodiscard attribute which is analogus to rust must use attribute. Rust also supports using must use on types which we still need to test/support but this is the building block to support this on functions which discard their results.

#[must_use = "TEST 1"]
fn test1() -> i32 {
    123
}

#[must_use = "TEST 2"]
fn test2() -> i32 {
    456
}

fn main() {
    let _a = test1();

    test2();
}

The error respects GCC -Wunused-result but this is turned on by default in the front-end.

<source>:14:5: warning: ignoring return value of 'example::test2', that must be used: 'TEST 2' [-Wunused-result]
   14 |     test2();
      |     ^
<source>:7:1: note: declared here
    7 | fn test2() -> i32 {
      | ^

see: https://godbolt.org/z/81j9G584e

Recursive macros using separators

Macros can be recusive resulting in new macro invocations which need to be expanded. They also can have matchers which are like regular expressions in their matchers which require n-number of arguments delimited by a single matcher to terminate the sequence. This looks very similar to bison grammer files which is pretty impressive how expressive macros are in rust.

macro_rules! add {
        ($e:expr | $($es:expr) | *) => {
            $e + add!($($es) | *)
        };
        ($e:expr) => {
            $e
        };
    }

fn test() -> i32 {
    add!(1 | 2 | 3 | 4 | 5 | 6)
}

see: https://godbolt.org/z/TfWrEovf3

Implement proper repetition separators

Rust allows users to define separators to use in macro repetitions. These separators help in making repeating macro invocations cleaner, and avoid this:

macro_rules! add0 {
    ($a:literal) => { $a };
    ($a:literal $($b:literal)+) => { $a + add0!($($b)*) }
}

macro_rules! add1 {
    ($a:literal,) => { $a };
    ($a:literal, $($b:literal,)+) => { $a + add1!($($b ,)*) }
}

add0!(1 2 3 4 67); // no separator
add1!(1, 2, 3, 4, 67,); // extra separator

Macro repetition separators are made of one token and positionned just before the repetition operator (?, * or +). We can now parse them, match them and expand them properly:

macro_rules! add {
    ($a:literal) => { $a };
    ($a:literal, $($b:literal),+) => { $a + add!($($b),*) }
}

add!(1, 2, 3, 4, 67);

Defining items and statements through macros

Macros can be used to avoid boilerplate and repetitive code, such as defining a large amount of types and their implementation should they all be similar.

This can be seen in the standard rust library in various builtin-types related code:

// Reduced version.
// This implements the `Sub` trait for all builtin number types
// The implementation is always the same, so macros help
pub trait Sub<Rhs = Self> {
    type Output;
    fn sub(self, rhs: Rhs) -> Self::Output;
}

macro_rules! sub_impl {
    ($($t:ty)*) => ($(
        impl Sub for $t {
            type Output = $t;

            #[inline]
            fn sub(self, other: $t) -> $t { self - other }
        }
    )*)
}

sub_impl! { usize u8 u16 u32 u64 u128 isize i8 i16 i32 i64 i128 f32 f64 }

This expands to a proper implementation of the Sub trait for all types mentioned, with proper expansion of the sub method and associated Output type. We are now able to parse those items correctly and expand them in place.

Likewise, macro invocations can also be expanded to multiple statements inside a block:

macro_rules! define_vars {
    ($([ $name:ident $value:literal ])*) => {
        $(let $name = $value;)*
    }
}

fn needs_lots_of_locals() {
    define_vars!([pear 14] [apple 'm'] [mango "Pi"]);
}

Expanding macros in more contexts

Last week’s macro improvements were focused on adding a base for in-place macro expansion. We worked on getting them properly expanded in two places, namely block statements and as crate items. However, macros can be used in many more ways:

A macro invocation expands a macro at compile time and replaces the invocation with the result of the macro. Macros may be invoked in the following situations:

  1. Expressions and statements
  2. Patterns
  3. Types
  4. Items including associated items
  5. macro_rules transcribers
  6. External blocks

You can now call macros from inside impl blocks, external blocks and trait definitions or implementations. If you’ve been following the Rust-for-Linux effort, you might have seen this pattern when defining file operations for a type. This allows defining your own function or relying on the kernel’s defaults safely.

macro_rules! c_fn {
    (int $name:ident ( const char_ptr $arg_name:ident)) => {
        fn $name($arg_name: *const i8) -> i32;
    };
}

extern "C" {
    c_fn! {int puts (const char_ptr s)}
}

macro_rules! add_distract_fn {
    () => {
        fn distract() {
            unsafe {
                puts("wait this isn't C\0" as *const str as *const i8);
            }
        }
    };
}

struct Abstract;

impl Abstract {
    add_distract_fn!();
}

macro_rules! require_proc {
    ($fn_name:ident) => {
        fn $fn_name();
    };
}

trait Abstractable {
    require_proc!(extract);
}

macro_rules! extract {
    ($fn_block:block) => {
        fn extract() $fn_block
    }
}

impl Abstractable for Abstract {
    extract! {{ Abstract::distract(); }}
}

Relaxed parsing rules in macro definitions and invocations

To improve usability, parsing rules when expanding macro nodes are a little more relaxed. As an example, this is completely valid rust code:

macro_rules! take_stmt {
    ($s:stmt) => {
        $s
    };
}

fn f() -> i32 {
    16
}

macro_rules! expand_to_stmt_or_expr {
    () => {
        f()
    };
}

fn main() {
    take_stmt!(let a1 = 15);

    let a2 = {
        expand_to_stmt_or_expr!(); // f is called as an expression-statement
        expand_to_stmt_or_expr!() // f is called as a tail expression
    };
}

include bytes builtin

This is now handled properly and makes for prettier macros and invocations, and avoids the necessity of adding extra semicolons in some cases.

Two new macro builtins have been added to the compiler thanks to David Faust: include_bytes! and include_str!. They allow the user to include files at compilation time, either as bytes or valid UTF-8 strings. This can be extremely useful for anyone dealing with binary blobs, and adds even more code for new contributors to reuse when adding more builtin macros.

Their definition is as follows:

macro_rules! include_str {
    ($file:expr $(,)?) => {{ /* compiler built-in */ }};
}
macro_rules! include_bytes {
    ($file:expr $(,)?) => {{ /* compiler built-in */ }};
}

Follow-set ambiguities

While rust macros are extremely powerful, they are also heavily restricted to prevent ambiguities. These restrictions include sets of allowed fragments that can follow a certain metavariable fragment, which are referred to as follow-sets.

As an example, the follow set of :expr fragments is { COMMA, SEMICOLON, MATCH_ARROW }. Any other token cannot follow an :expr fragment, as it might cause ambiguities in later versions of the language.

This was previously not handled by gccrs at all. As a result, we had some test cases that contained ambiguous macro definitions that rustc rejected.

We dedicated some time this week to implement (almost!) all of these restrictions, including some complex cases involving repetitions:

Looking past zeroable repetitions

macro_rules! invalid {
  ($e:expr $(,)? $(;)* $(=>)* forbidden) => {{}};
  //  1      2     3     4        5         (matches)
}

Since matches 2, 3 and 4 might occur zero times (kleene operators * or ?), we need to check that the forbidden token is allowed to follow an :expr fragment, which is not the case since identifier tokens are not contained in its follow-set.

On the other hand, this macro is perfectly valid since a comma, contained in the follow-set of :expr, is required to appear at least once before any forbidden tokens

macro_rules! invalid {
  ($e:expr $(;)* $(,)+ $(=>)* forbidden) => {{}};
  // `+` kleen operator indicates one or more, meaning that there will always be at least one comma
}

Metavar fragments following other metavar fragments

macro_rules! mac {
  ($t:ty $lit:literal) => {{}}; // invalid
  ($t:ty $lit:block) => {{}}; // valid
}

The follow-set of :ty fragments allows the user to specify another fragment as follow-up, but only if this metavar fragment is a :block one.

An interesting tidbit is that these checks are performed at the beginning of the expansion phase in rustc, while we go through them during parsing. This is not set in stone, and we’d love to perform them later if required.

The remaining issues are marked as good-first-pr as they are simple and offer an entrypoint into the compiler’s implementation of macros.

Restrict merged repetitions to metavars with the same amount of repetitions

Likewise, you cannot merge together repetitions which do not have the same amount of repetitions:

macro_rules! tuplomatron {
  ($($e:expr),* ; $($f:expr),*) => { ( $( ( $e, $f ) ),* ) };
}

let tuple = tuplomatron!(1, 2, 3; 4, 5, 6); // valid
let tuple = tuplomatron!(1, 2, 3; 4, 5); // invalid since both metavars do not have the same amount of repetitions

This gets expanded properly into one big tuple:

let tuple = TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 1
 4
 TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 2
 5
 TupleExpr:
 outer attributes: none
 inner attributes: none
Tuple elements:
 3
 6
final expression: none

Handle :tt fragments properly

Having :tt fragments handled properly allows us to dwelve into the world of tt-munchers, a very powerful pattern which allows the implementation of extremely complex behaviors or DSLs. The target code we’re using for this comes directly from The Little Book of Rust Macros by Lukas Wirth, adapted to fit our non-println-aware compiler.

extern "C" {
    fn printf(fmt: *const i8, ...);
}

fn print(name: &str, value: i32) {
    unsafe {
        printf(
            "%s = %d\n\0" as *const str as *const i8,
            name as *const str as *const i8,
            value,
        );
    }
}

macro_rules! mixed_rules {
    () => {{}};
    (trace $name_str:literal $name:ident; $($tail:tt)*) => {
        {
            print($name_str, $name);
            mixed_rules!($($tail)*);
        }
    };
    (trace $name_str:literal $name:ident = $init:expr; $($tail:tt)*) => {
        {
            let $name = $init;
            print($name_str, $name);
            mixed_rules!($($tail)*);
        }
    };
}

fn main() {
    mixed_rules! (trace "a\0" a = 14; trace "a\0" a; trace "b\0" b = 15;);
}

This is now handled by gccrs, and produces the same output as rustc.

~/G/gccrs > rustc tt-muncher.rs
~/G/gccrs > ./tt-muncher
a = 14
a = 14
b = 15
~/G/gccrs > gccrs tt-muncher.rs -o tt-muncher-gccrs
~/G/gccrs > ./tt-muncher-gccrs
a = 14
a = 14
b = 15

Leave a Reply

Your email address will not be published.