Thanks again to Open Source Security, inc and Embecosm for their ongoing support for this project.
Milestone Progress
February was a big month for GCC Rust; our previous Google Summer of Code student Arthur Cohen joins us in Embecosm, Germany, working on the compiler full time. With the additional resource, we can split up work and delegate tasks allowing multiple streams of complex work to take place, which frees Philip up to work on Slice’s and bugs in the type system.
Concerning our macro expansion milestone, we have the building blocks in place and support most declarative macros though there are some quirks and bugs to work through still. The remaining time of this milestone will be spent on setting up the builtin macros and fixing bugs.
Other than macros, there has been a focus on code cleanup and bug fixing in February, which has been fruitful.
Monthly Community Call
It’s time for our next community call, feel free to join in! 🙂
- Date and Time 4th March 2022 at: 14h00 UTC
- Agenda: https://hackmd.io/KkrXwApIQgWwaUL8fl1Yrg please feel free to add agenda items you wish to see discussed.
- Jitsi: https://meet.jit.si/ArtificialPantsFlashNeither
Completed Activities
- Bug Fix canonical-path of impl blocks nested under modules PR900
- Bug Fix enum discriminant values to use constexpr code PR902
- Apply cfg expansion to the rest of the crate PR904
- Track canonical-path and location into as part of the type-check pass PR903
- Apply .c to .cc rename as part of merge from upstream PR906
- Support any,not,all predicates in cfg-expansion PR907
- Enable GCC self-test framework PR751
- Fix diagnostic formatting issues -Wformat-diag PR908
- Improve location info on match-arms PR888
- Support key value pairs being passed in -frust-cfg PR909
- Refactor TypeKind ToString from header to implementation PR911
- Bug Fix multiple generic substitution on path expressions PR912
- Support the inline attribute PR916 PR922
- Reuse C/C++ front-end mark_addressable code PR917
- Cleanup code generation to remove duplication PR918
- Support deref_mut lang item during method resolution PR920
- Add Macro expansion tests PR926
- BugFix CFG expansion values require quotes PR931 PR935
- Add cargo-gccrs to our DockerFile PR937
- Add missing location into to macros PR934 PR933 PR932
- Add support for Macro Expansion PR938
- Add missing location into to AST PR940
- replace lambda with std::vector reference in AST::PathPattern PR942
- Add clear_errors to make parser reuseable for macro expansion PR944
- Support matching macro repetition rules PR950
- Add name resolution for slice’s PR951
- Repetition macros PR955 PR965 PR956
- Fix unresolved test-case PR964
- Refactor lang item mappings PR953
- Add builtin macro framework PR969
- Add Support for index and Range lang items and boiler plate for Slice typechecking PR974
- Add file! builtin macro PR970
Contributors this month
Overall Task Status
Category | Last Month | This Month | Delta |
TODO | 101 | 118 | +17 |
In Progress | 19 | 17 | -2 |
Completed | 273 | 297 | +24 |
Test Cases
Category | Last Month | This Month | Delta |
Passing | 5617 | 6068 | +451 |
Failed | – | – | – |
XFAIL | 21 | 21 | – |
XPASS | – | – | – |
Bugs
Category | Last Month | This Month | Delta |
TODO | 34 | 40 | +6 |
In Progress | 5 | 5 | – |
Completed | 102 | 109 | +7 |
Milestones Progress
Milestone | Last Month | This Month | Delta | Start Date | Completion Date | Target |
Data Structures 1 – Core | 100% | 100% | – | 30th Nov 2020 | 27th Jan 2021 | 29th Jan 2021 |
Control Flow 1 – Core | 100% | 100% | – | 28th Jan 2021 | 10th Feb 2021 | 26th Feb 2021 |
Data Structures 2 – Generics | 100% | 100% | – | 11th Feb 2021 | 14th May 2021 | 28th May 2021 |
Data Structures 3 – Traits | 100% | 100% | – | 20th May 2021 | 17th Sept 2021 | 27th Aug 2021 |
Control Flow 2 – Pattern Matching | 100% | %100 | – | 20th Sept 2021 | 9th Dec 2021 | 29th Nov 2021 |
Macros and cfg expansion | 18% | 65% | +47% | 1st Dec 2021 | – | 28th Mar 2022 |
Imports and Visibility | 0% | 0% | – | 29th Mar 2022 | – | 27th May 2022 |
Const Generics | 0% | 0% | – | 30th May 2022 | – | 25th Jul 2022 |
Intrinsics | 0% | 0% | – | 6th Sept 2021 | – | 30th Sept 2022 |
Risks
Risk | Impact (1-3) | Likelihood (0-10) | Risk (I * L) | Mitigation |
Rust Language Changes | 3 | 7 | 21 | Keep up to date with the Rust language on a regular basis |
Going over target dates | 3 | 5 | 15 | Maintain status reports and issue tracking to stakeholders |
Planned Activities
- Continue work on builtin macros
- Continue work into Slices
- Support the mutable context during type checking for dereference or array index operations
- Create more good-first-pr’s
Detailed changelog
Canonical-paths
We have improved our canonical-path tracking so that we can build up paths for the legacy mangling scheme. So for example impl blocks nested under modules are given a prefix of impl in their path.
struct Foo(i32);
mod A {
impl Foo {
fn test(&self) -> i32 {
self.0
}
}
}
fn test() {
let a = Foo(123);
let b:i32 = a.test();
}
As you can see we have the crate-name of example -> structure A -> impl block for example::A -> function name test.
i32 example::A::<impl example::Foo>::test (const struct example::Foo & const self)
{
i32 D.85;
D.85 = self->0;
return D.85;
}
void example::test ()
{
const struct example::Foo a;
const i32 b;
try
{
a.0 = 123;
b = example::A::<impl example::Foo>::test (&a);
}
finally
{
a = {CLOBBER};
}
}
see: https://godbolt.org/z/P94an5f5W
cfg expansion with predicates
We added support for any, all and not predicates on cfg expansions so in this example this ensures that both A and B are specified for the all predicate.
struct Foo;
impl Foo {
#[cfg(all(A, B))]
fn test(&self) {}
}
fn main() {
let a = Foo;
a.test();
}
see: https://godbolt.org/z/sW9K19EqE
Key-value cfg-expansion
Rust allows us to specify key-value pairs for config expansion this is mostly associated with host/os/cpu options such as os = “linux” for example but here is an example below you can try in compiler explorer.
struct Foo;
impl Foo {
#[cfg(A = "B")]
fn test(&self) {}
}
fn main() {
let a = Foo;
a.test();
}
see: https://godbolt.org/z/7YT1jMMMz
inline attributes
In Rust the inline attribute takes three forms:
- #[inline]
- #[inline(always)]
- #[inline(never)]
Inline without any option is analogous to C style inline keyword giving a hint to the compiler that this function is a good candidate for inlining. Inline always can be acheived with GCC’s inline always attribute: https://gcc.gnu.org/onlinedocs/gcc/Inline.html. Finally never we can mark functions as DECL_UNINLINEABLE. The one difference is that inline optimizations require optimizations to be enabled. So when compiling at -O0 no inlining will occur, any level greater than this, the inline pass will be enforced.
We have always added some simple error handling for bad inline options such as:
#[inline(A)]
fn test() {}
test.rs:2:3: error: unknown inline option
2 | #[inline(A)]
| ^
#[inline(A,B)]
fn test() {}
test.rs:5:3: error: invalid number of arguments
5 | #[inline(A, B)]
| ^
deref_mut lang item
Work on method resolution has continued steadily and we now support the deref_mut lang item so that for methods that require a &mut self reference we try to lookup any relevant deref_mut lang item to get the indirection required from the receiver.
extern "C" {
fn printf(s: *const i8, ...);
}
#[lang = "deref"]
pub trait Deref {
type Target;
fn deref(&self) -> &Self::Target;
}
#[lang = "deref_mut"]
pub trait DerefMut: Deref {
fn deref_mut(&mut self) -> &mut Self::Target;
}
impl<T> Deref for &T {
type Target = T;
fn deref(&self) -> &T {
*self
}
}
impl<T> Deref for &mut T {
type Target = T;
fn deref(&self) -> &T {
*self
}
}
pub struct Bar(i32);
impl Bar {
pub fn foobar(&mut self) -> i32 {
self.0
}
}
pub struct Foo<T>(T);
impl<T> Deref for Foo<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl<T> DerefMut for Foo<T> {
fn deref_mut(&mut self) -> &mut Self::Target {
unsafe {
let a = "mut_deref\n\0";
let b = a as *const str;
let c = b as *const i8;
printf(c);
}
&mut self.0
}
}
pub fn main() -> i32 {
let bar = Bar(123);
let mut foo: Foo<Bar> = Foo(bar);
let foobar = foo.foobar();
foobar - 123
}
See https://godbolt.org/z/xcM9ohcjK
Declarative Macro Expansion
We have merged our first pass of the macro expansion pass. The approach taken here is that we reuse our existing parser to call the apropriate functions as specified as part of the MacroFragmentType enum if the parser does not have errors parsing that item then it must be a match. Then once we match a rule we have a map of the token begin/end offsets for each fragment match, this is then used to adjust and create a new token stream for the macro rule definition so that when we feed it to the parser the tokens are already substituted. The resulting expression or item is then attached to the respective macro invocation and this is then name resolved and used for hir lowering.
In this example the macro has two rules so we demonstrate that we match the apropriate rule and transcribe it respectively.
macro_rules! add {
($a:expr,$b:expr) => {
$a + $b
};
($a:expr) => {
$a
};
}
fn main() -> i32 {
let mut x = add!(1);
x += add!(2, 3);
x - 6
}
Another exmaple:
macro_rules! Test {
($a:ident, $b:ty) => {
struct $a($b);
};
}
Test!(Foo, i32);
fn main() -> i32 {
let a = Foo(123);
a.0 - 123
}
Here we take into account the context of the macro invocation and parse it into AST::Items. In the even of failure to match a rule the compiler error looks like the following:
<source>:11:17: error: Failed to match any rule within macro
1 | macro_rules! add {
| ~
......
11 | let mut x = add!(1, 2, 3);
| ^
More error handling has been added for when the transcribed rule actually is not fully used so for example:
<source>:4:9: error: tokens here and after are unparsed
4 | struct BAD($b);
| ^
see: https://godbolt.org/z/TK3qdG56n
Range Lang items
In rust ranges are turned into structs so what seems like piece of syntax to specify some kind of constraint is actually something which can be assigned and manipulated. This is one of the building blocks in our journey to support slices.
#[lang = "RangeFull"]
pub struct RangeFull;
#[lang = "Range"]
pub struct Range<Idx> {
pub start: Idx,
pub end: Idx,
}
#[lang = "RangeFrom"]
pub struct RangeFrom<Idx> {
pub start: Idx,
}
#[lang = "RangeTo"]
pub struct RangeTo<Idx> {
pub end: Idx,
}
#[lang = "RangeInclusive"]
pub struct RangeInclusive<Idx> {
pub start: Idx,
pub end: Idx,
}
fn test() {
let a = 1..2; // range
let b = 1..; // range from
let c = ..3; // range to
let d = 0..=2; // range inclusive
}
See: https://doc.rust-lang.org/std/ops/struct.Range.html
Index Lang items
Another building block to support slices is the ability to suport the index lang item core::ops::index so that a range can be an argument and the code in core::slice::index can actually become the starting point in giving us a slice from an array.
#[lang = "index"]
trait Index<Idx> {
type Output;
fn index(&self, index: Idx) -> &Self::Output;
}
struct Foo(i32, i32);
impl Index<isize> for Foo {
type Output = i32;
fn index(&self, index: isize) -> &i32 {
if index == 0 {
&self.0
} else {
&self.1
}
}
}
fn main() -> i32 {
let a = Foo(1, 2);
let b = a[0];
let c = a[1];
c - b - 1
}
See: https://doc.rust-lang.org/core/ops/trait.Index.html
Repetition Macros
Matching macro repetitions
Macro match arms can contain repetition operators, which indicate the possibilty of passing multiple instances of a single token or metavariable.
You can denote such repetitions using Kleene operators: Three variants are available, ?
, +
and *
. Each corresponds to varying bounds on the amount of tokens associated with the operator, similarly to regular expressions.
macro_rules! kleene {
($a:ident $(,)?) => {{ }};
($($i:literal tok)+) => {{ }};
($($e:expr)*) => {{ }};
}
The declaration above contains three possible matching invocations:
- Either a singular identifier, with zero or one commas (pattern:
<comma>
, kleene operator:?
(0 -> 1)) - One or more literal followed by the separator
tok
(pattern$i:literal tok
, kleene operator:+
(1 ->+inf
)) - Zero or more expressions
tok
(pattern$e:expr
, kleene operator:*
(0 ->+inf
))
The first of implementing macro repetitions comes in matching the actual patterns given to the users. We are now able to match simple repetitions, with a few limitations and bugs still. For example, the Rust reference specifies valid separators to use after fragment specifiers, which we do not check yet. It is for example forbidden to add an identifier after an $<>:expr
specifier, since that could cause ambiguity: The only allowed separators after an expression are thus =>
, <comma>
or ;
.
See: https://doc.rust-lang.org/reference/macros-by-example.html#follow-set-ambiguity-restrictions
Once those repetition patterns are matched, it is easy to figure out how many repetitions of said pattern were given by the user. We store this data alongside the rest of the fragment, to make sure that we expand said pattern a correct amount of times when transcribing.
Given the following match arm:
macro_rules! lit_plus_tok {
($($e:literal tok)*) => {}
}
And the following invocation:
lit_plus_tok!("rustc" tok 'v' tok 1.59 tok);
we will have matched the repetition 3 times, and attributed a repetition amount of 3 to the $e
meta-variable.
See: https://doc.rust-lang.org/rust-by-example/macros/repeat.html and https://doc.rust-lang.org/reference/macros-by-example.html#repetitions
Expanding macro repetitions
Following the matching of these repetitions, we can recursively expand all tokens contained in the pattern.
Considering once again the previous declaration and invocation, we can parse the following pattern as the one to expand:
($e:literal tok)
This pattern is then recursively expanded as if it was a regular macro invocation. In order to make sure that each meta-variable gets expanded correctly, we only give a subset of the matched fragments to the new subsitution context.
macro_rules! lit_plus_tok {
($($e:literal tok)*) => {}
}
lit_plus_tok!("rustc" tok 'v' tok 1.59 tok);
// Original matched fragments: { "lit": ["rustc", 'v', 1.59] }
// We then expand the repetition pattern once with { "lit": ["rustc"] },
// once with { "lit": ['v'] },
// and finally once with { "lit": [1.59] },
Once again, certain restrictions apply, which we have yet to implement: Some specifiers get expanded eagerly, while some stay under the form inputted by the user.
See: https://doc.rust-lang.org/reference/macros-by-example.html#transcribing
Likewise, not all repetition patterns are covered properly. Some issues remain to be ironed out for a complete and correct implementation.
Builtin macros
In order to implement some specific behaviour, the rust standard library requires some macros to be built into the compiler. You can find a full list here.
gccrs
should implement to allow for the compilation of the standard rust library, as both core
and std
depend on a multitude of them.
These macros are defined as empty within the core library, and their transcriber is provided in the compiler as a simple function. We implement those builtins in gccrs
as functions returning fragments of abstract syntax trees, which are inserted during the macro-expansion phase and then lowered to an intermediate representation alongside the rest of the user’s code.
We have a long list of macros ahead of us, some of which we should be able to implement easily. If you are interested in contributing, we have opened 3 good first issues regarding builtin macros with detailed guides on how to solve them.
Thanks a lot to bjorn3 for all the help regarding builtin macros and their implementation details.