Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.
Milestone Progress
March was a big month for the project; with Arthur joining Embecosm, we have been able to split up milestone work together. With his expertise, he was able to take over the development of macros and allow Philbert to concentrate on working on the type-system. As a result, we have closed out the macros milestone, the remaining work here is completing built-in macros, but we see this as part of our ongoing builtin’s/intrinsics milestone, which is ongoing anyway. Many of these builtins are pretty simple and are gateways for new developers to join the project as they see fit.
Note that there is a drop in the passing tests for our test suite. We have not removed any tests, but in dejagnu, we had many bad/duplicate unused code warnings; these are now fixed by reusing GCC infrastructure for unused variable detection code. It also removed our old AST unused code scan and improved our existing dead-code scan pass.
We have added a new pass on the automation front to check that our front-end continues to compile with the minimum GCC version so that we don’t break the bootstrap chain as we are using C++11. Thanks to the community and Thomas Schwinge for his work here.
Moving onto our next milestone of imports and visibility, we see this as breaking down into two streams of work:
- Metadata exports and use statements
- privacy checks like rustc_privacy
We are still working through how we will perform metadata exports and have been investigating:
- CPP modules
- LTO streaming
- rolling our own
This will be a great opportunity to clean up and refactor how we perform name resolution to incorporate use statements.
Monthly Community Call
We had our regular community call on 1st April 2022, please find our meeting notes over on: https://github.com/Rust-GCC/Reporting/blob/main/2022-04-01-community-call.md
Testing Project
Recently we have created a new project under the Rust-GCC organisation for automated testing of gccrs in the wild. https://github.com/Rust-GCC/testing. The goal here is to allow for automated testing for more complex test-cases that don’t have to be part of the automated dejagnu compiler testsuite.
Currently we have automated testing for
- Test gccrs with -fsyntax-only on rustc testsuite
- Test rustc against gccrs dejagnu testsuite
- most failures are because we have not implemented the main-shim
We are aiming to do more here with the aspiration to:
- Test gccrs against rustc fully
- Test gccrs against projects like Rust-GCC/gccrs#682
- Benchmarking research
- code-generation comparison and research
Leveraging automation allows us to track changes monthly without impacting development of the compiler directly allowing those who are interested to recreate the results locally.
Completed Activities
- Refactor substitution context during macro expansion to be in its own file PR981
- Enforce quotes during command line cfg arguments PR983
- Bugfix memory corruption of lexing string buffers PR988
- Remove bad lambda from iteration of arguments on function types PR984
- Add must_use attribute support PR990
- Bug fix parsing macro invocation with semicolon’s during statement contexts PR985
- Fix ICE during recursive macro invocations PR986
- Support repetition separators in macro rules PR991
- Refactor HIR visitor and split it up in stmt, vis-item, pattern, external-item, impl, type and expression visitors PR954
- Fix bad unused code warnings PR992
- Macros can allow any delimiters for the innovcation PR997
- Fix bugs in parsing macro repetitions PR994
- Refactor ABI options into an enum during HIR lowering PR999
- Handle macro invocations as statements vs expressions PR998
- Cleanup how multiple matches are handled PR1002
- Refactor how builtins/intrinsics are handled and add unreachable, abort, size_of and offset PR1003
- Bug fix ICE on impl blocks for arrays or slices PR1007
- Add missing generic substitution for covariants types slices and arrays PR1009
- Add const_ptr lang item mappings PR1008
- Implement HIR lowering for AST::SliceType PR1016
- Refactor attribute visitor into its own file PR1017
- Add more documentation for builtin macros PR1018
- Generate GCC code for the libcore FatPtr/SliceType PR1015
- Implement the builtin column! macro PR1004
- Support placeholders becoming slices PR1037
- Handle -fsyntax-only PR1035
- Fix bad copy-past in can equal interface for pointer types PR1033
- Add AST kind information PR1032
- Rewrite our unconstrained type-param error checking PR1030
- Macro in trait impl block PR1029
- Allow parsing statements without closing semicolon PR1027
- Fix memory corruption in generation of builtin functions PR1025
- Fix spurious stripping of tail expression PR1022
- Do not try and re-expand macros if depth has exceeded recursion limit PR1021
- Enable -Werror in CI PR1026
- Do not propagate parser errors in match_repetitions PR1040
- Only expand merged repetitions if they contain the same amount PR1041
- Implement include_bytes! and include_str! PR1043
- Restrict follow up tokens on :expr and :stmt PR1044
- Add helper function for subsituted tokens debugging PR1047
- Add better restrictions around semicolons in statements parsing PR1049
- Add remaining restrictions for follow-set restrictions PR1051
- Add hints for valid follow-set tokens PR1052
- Fix overzealous follow-set ambiguity PR1054
- Allow checking past zeroable matches for follow-set restrictions PR1055
- Fix #include <algorithm> PR1056
- Provide std::hash for Rust::AST::MacroFragSpec::Kind enum class PR1057
- Properly perform follow-set checking on matchers PR1062
- Handle :tt fragments properly PR1064
- Handle :meta fragments properly PR1063
Contributors this month
Overall Task Status
Category | Last Month | This Month | Delta |
TODO | 118 | 114 | -4 |
In Progress | 17 | 23 | +6 |
Completed | 297 | 338 | +41 |
Test Cases
Category | Last Month | This Month | Delta |
Passing | 6068 | 5701 | -367 |
Failed | – | – | – |
XFAIL | 21 | 22 | +1 |
XPASS | – | – | – |
Bugs
Category | Last Month | This Month | Delta |
TODO | 40 | 39 | -1 |
In Progress | 5 | 10 | +5 |
Completed | 109 | 130 | +21 |
Milestones Progress
Milestone | Last Month | This Month | Delta | Start Date | Completion Date | Target |
Data Structures 1 – Core | 100% | 100% | – | 30th Nov 2020 | 27th Jan 2021 | 29th Jan 2021 |
Control Flow 1 – Core | 100% | 100% | – | 28th Jan 2021 | 10th Feb 2021 | 26th Feb 2021 |
Data Structures 2 – Generics | 100% | 100% | – | 11th Feb 2021 | 14th May 2021 | 28th May 2021 |
Data Structures 3 – Traits | 100% | 100% | – | 20th May 2021 | 17th Sept 2021 | 27th Aug 2021 |
Control Flow 2 – Pattern Matching | 100% | %100 | – | 20th Sept 2021 | 9th Dec 2021 | 29th Nov 2021 |
Macros and cfg expansion | 65% | 100% | +35% | 1st Dec 2021 | – | 28th Mar 2022 |
Imports and Visibility | 0% | 0% | – | 29th Mar 2022 | – | 27th May 2022 |
Const Generics | 0% | 0% | – | 30th May 2022 | – | 25th Jul 2022 |
Intrinsics | 0% | 0% | – | 6th Sept 2021 | – | 30th Sept 2022 |
Risks
Risk | Impact (1-3) | Likelihood (0-10) | Risk (I * L) | Mitigation |
Rust Language Changes | 3 | 7 | 21 | Keep up to date with the Rust language on a regular basis |
Going over target dates | 3 | 5 | 15 | Maintain status reports and issue tracking to stakeholders |
Rustc testsuite with -fsyntax-only
Category | Last Month | This Month | Delta |
Passing | – | 10618 | – |
Failed | – | 2436 | – |
Planned Activities
- Continue research into rustc metadata exports
- fix bugs with generic associated types
- begin work on privacy pass akin to rustc_privacy
Detailed changelog
must use attribute
To support must use, the GCC CPP front-end already supports the C++ nodiscard attribute which is analogus to rust must use attribute. Rust also supports using must use on types which we still need to test/support but this is the building block to support this on functions which discard their results.
#[must_use = "TEST 1"]
fn test1() -> i32 {
123
}
#[must_use = "TEST 2"]
fn test2() -> i32 {
456
}
fn main() {
let _a = test1();
test2();
}
The error respects GCC -Wunused-result but this is turned on by default in the front-end.
<source>:14:5: warning: ignoring return value of 'example::test2', that must be used: 'TEST 2' [-Wunused-result]
14 | test2();
| ^
<source>:7:1: note: declared here
7 | fn test2() -> i32 {
| ^
see: https://godbolt.org/z/81j9G584e
Recursive macros using separators
Macros can be recusive resulting in new macro invocations which need to be expanded. They also can have matchers which are like regular expressions in their matchers which require n-number of arguments delimited by a single matcher to terminate the sequence. This looks very similar to bison grammer files which is pretty impressive how expressive macros are in rust.
macro_rules! add {
($e:expr | $($es:expr) | *) => {
$e + add!($($es) | *)
};
($e:expr) => {
$e
};
}
fn test() -> i32 {
add!(1 | 2 | 3 | 4 | 5 | 6)
}
see: https://godbolt.org/z/TfWrEovf3
Implement proper repetition separators
Rust allows users to define separators to use in macro repetitions. These separators help in making repeating macro invocations cleaner, and avoid this:
macro_rules! add0 {
($a:literal) => { $a };
($a:literal $($b:literal)+) => { $a + add0!($($b)*) }
}
macro_rules! add1 {
($a:literal,) => { $a };
($a:literal, $($b:literal,)+) => { $a + add1!($($b ,)*) }
}
add0!(1 2 3 4 67); // no separator
add1!(1, 2, 3, 4, 67,); // extra separator
Macro repetition separators are made of one token and positionned just before the repetition operator (?
, *
or +
). We can now parse them, match them and expand them properly:
macro_rules! add {
($a:literal) => { $a };
($a:literal, $($b:literal),+) => { $a + add!($($b),*) }
}
add!(1, 2, 3, 4, 67);
Defining items and statements through macros
Macros can be used to avoid boilerplate and repetitive code, such as defining a large amount of types and their implementation should they all be similar.
This can be seen in the standard rust library in various builtin-types related code:
// Reduced version.
// This implements the `Sub` trait for all builtin number types
// The implementation is always the same, so macros help
pub trait Sub<Rhs = Self> {
type Output;
fn sub(self, rhs: Rhs) -> Self::Output;
}
macro_rules! sub_impl {
($($t:ty)*) => ($(
impl Sub for $t {
type Output = $t;
#[inline]
fn sub(self, other: $t) -> $t { self - other }
}
)*)
}
sub_impl! { usize u8 u16 u32 u64 u128 isize i8 i16 i32 i64 i128 f32 f64 }
This expands to a proper implementation of the Sub
trait for all types mentioned, with proper expansion of the sub
method and associated Output
type. We are now able to parse those items correctly and expand them in place.
Likewise, macro invocations can also be expanded to multiple statements inside a block:
macro_rules! define_vars {
($([ $name:ident $value:literal ])*) => {
$(let $name = $value;)*
}
}
fn needs_lots_of_locals() {
define_vars!([pear 14] [apple 'm'] [mango "Pi"]);
}
Expanding macros in more contexts
Last week’s macro improvements were focused on adding a base for in-place macro expansion. We worked on getting them properly expanded in two places, namely block statements and as crate items. However, macros can be used in many more ways:
A macro invocation expands a macro at compile time and replaces the invocation with the result of the macro. Macros may be invoked in the following situations:
- Expressions and statements
- Patterns
- Types
- Items including associated items
- macro_rules transcribers
- External blocks
You can now call macros from inside impl
blocks, external blocks and trait definitions or implementations. If you’ve been following the Rust-for-Linux effort, you might have seen this pattern when defining file operations for a type. This allows defining your own function or relying on the kernel’s defaults safely.
macro_rules! c_fn {
(int $name:ident ( const char_ptr $arg_name:ident)) => {
fn $name($arg_name: *const i8) -> i32;
};
}
extern "C" {
c_fn! {int puts (const char_ptr s)}
}
macro_rules! add_distract_fn {
() => {
fn distract() {
unsafe {
puts("wait this isn't C\0" as *const str as *const i8);
}
}
};
}
struct Abstract;
impl Abstract {
add_distract_fn!();
}
macro_rules! require_proc {
($fn_name:ident) => {
fn $fn_name();
};
}
trait Abstractable {
require_proc!(extract);
}
macro_rules! extract {
($fn_block:block) => {
fn extract() $fn_block
}
}
impl Abstractable for Abstract {
extract! {{ Abstract::distract(); }}
}
Relaxed parsing rules in macro definitions and invocations
To improve usability, parsing rules when expanding macro nodes are a little more relaxed. As an example, this is completely valid rust code:
macro_rules! take_stmt {
($s:stmt) => {
$s
};
}
fn f() -> i32 {
16
}
macro_rules! expand_to_stmt_or_expr {
() => {
f()
};
}
fn main() {
take_stmt!(let a1 = 15);
let a2 = {
expand_to_stmt_or_expr!(); // f is called as an expression-statement
expand_to_stmt_or_expr!() // f is called as a tail expression
};
}
include bytes builtin
This is now handled properly and makes for prettier macros and invocations, and avoids the necessity of adding extra semicolons in some cases.
Two new macro builtins have been added to the compiler thanks to David Faust: include_bytes!
and include_str!
. They allow the user to include files at compilation time, either as bytes or valid UTF-8 strings. This can be extremely useful for anyone dealing with binary blobs, and adds even more code for new contributors to reuse when adding more builtin macros.
Their definition is as follows:
macro_rules! include_str {
($file:expr $(,)?) => {{ /* compiler built-in */ }};
}
macro_rules! include_bytes {
($file:expr $(,)?) => {{ /* compiler built-in */ }};
}
Follow-set ambiguities
While rust macros are extremely powerful, they are also heavily restricted to prevent ambiguities. These restrictions include sets of allowed fragments that can follow a certain metavariable fragment, which are referred to as follow-sets.
As an example, the follow set of :expr
fragments is { COMMA
, SEMICOLON
, MATCH_ARROW
}. Any other token cannot follow an :expr
fragment, as it might cause ambiguities in later versions of the language.
This was previously not handled by gccrs at all. As a result, we had some test cases that contained ambiguous macro definitions that rustc rejected.
We dedicated some time this week to implement (almost!) all of these restrictions, including some complex cases involving repetitions:
Looking past zeroable repetitions
macro_rules! invalid {
($e:expr $(,)? $(;)* $(=>)* forbidden) => {{}};
// 1 2 3 4 5 (matches)
}
Since matches 2
, 3
and 4
might occur zero times (kleene operators *
or ?
), we need to check that the forbidden
token is allowed to follow an :expr
fragment, which is not the case since identifier tokens are not contained in its follow-set.
On the other hand, this macro is perfectly valid since a comma, contained in the follow-set of :expr
, is required to appear at least once before any forbidden tokens
macro_rules! invalid {
($e:expr $(;)* $(,)+ $(=>)* forbidden) => {{}};
// `+` kleen operator indicates one or more, meaning that there will always be at least one comma
}
Metavar fragments following other metavar fragments
macro_rules! mac {
($t:ty $lit:literal) => {{}}; // invalid
($t:ty $lit:block) => {{}}; // valid
}
The follow-set of :ty
fragments allows the user to specify another fragment as follow-up, but only if this metavar fragment is a :block
one.
An interesting tidbit is that these checks are performed at the beginning of the expansion phase in rustc, while we go through them during parsing. This is not set in stone, and we’d love to perform them later if required.
The remaining issues are marked as good-first-pr
as they are simple and offer an entrypoint into the compiler’s implementation of macros.
Restrict merged repetitions to metavars with the same amount of repetitions
Likewise, you cannot merge together repetitions which do not have the same amount of repetitions:
macro_rules! tuplomatron {
($($e:expr),* ; $($f:expr),*) => { ( $( ( $e, $f ) ),* ) };
}
let tuple = tuplomatron!(1, 2, 3; 4, 5, 6); // valid
let tuple = tuplomatron!(1, 2, 3; 4, 5); // invalid since both metavars do not have the same amount of repetitions
This gets expanded properly into one big tuple:
let tuple = TupleExpr:
outer attributes: none
inner attributes: none
Tuple elements:
TupleExpr:
outer attributes: none
inner attributes: none
Tuple elements:
1
4
TupleExpr:
outer attributes: none
inner attributes: none
Tuple elements:
2
5
TupleExpr:
outer attributes: none
inner attributes: none
Tuple elements:
3
6
final expression: none
Handle :tt fragments properly
Having :tt
fragments handled properly allows us to dwelve into the world of tt-munchers, a very powerful pattern which allows the implementation of extremely complex behaviors or DSLs. The target code we’re using for this comes directly from The Little Book of Rust Macros by Lukas Wirth, adapted to fit our non-println-aware compiler.
extern "C" {
fn printf(fmt: *const i8, ...);
}
fn print(name: &str, value: i32) {
unsafe {
printf(
"%s = %d\n\0" as *const str as *const i8,
name as *const str as *const i8,
value,
);
}
}
macro_rules! mixed_rules {
() => {{}};
(trace $name_str:literal $name:ident; $($tail:tt)*) => {
{
print($name_str, $name);
mixed_rules!($($tail)*);
}
};
(trace $name_str:literal $name:ident = $init:expr; $($tail:tt)*) => {
{
let $name = $init;
print($name_str, $name);
mixed_rules!($($tail)*);
}
};
}
fn main() {
mixed_rules! (trace "a\0" a = 14; trace "a\0" a; trace "b\0" b = 15;);
}
This is now handled by gccrs, and produces the same output as rustc.
~/G/gccrs > rustc tt-muncher.rs
~/G/gccrs > ./tt-muncher
a = 14
a = 14
b = 15
~/G/gccrs > gccrs tt-muncher.rs -o tt-muncher-gccrs
~/G/gccrs > ./tt-muncher-gccrs
a = 14
a = 14
b = 15