-
Notifications
You must be signed in to change notification settings - Fork 162
Procedural macros
Procedural macros (often abbreviated proc macros
) are a mechanism in rust that takes input code, modify it and output valid rust code. They differ from MBEs.
A procedural macro shall be declared in an external crate rather than directly in the crate containing the code to modify, this implies at least two compilation passes, first the procedural macro crate is compiled then the actual code requiring the macro. Note that multiple procedural macros may live in the same crate.
Procedural macros must reside in the root of their crate.
There are three kinds of procedural macros that should be used depending on the intent as well as the context.
trait Titi {}
#[derive(Titi)] // Derive proc macro invocation.
struct Toto;
#[tata] // Attribute proc macro.
fn test() {
tutu!(); // Bang/function like proc macro. same as "regular macros"!
}
Those macros are compiled as a shared object that is then dynamically loaded during the code expansion pass of the compiler. In gccrs the compiler converts the part of the ast that should be expanded back to tokens. Then it converts those tokens to procedural macro types which are very similar to tokens. Those types are contained in a TokenStream
structure akin of std::vector<Token>
.
This kind of procedural macro can be found on items, trait implementations as well as trait definitions.
#[proc_macro_attribute]
pub fn my_attribute_proc_macro(attr: TokenStream, item: TokenStream) -> TokenStream
Custom derive procedural macros allows auto implementation of a given trait. That trait name is defined in the macro definition's attribute.
#[proc_macro_derive(TraitName)]
pub fn my_derive_proc_macro(item: TokenStream) -> TokenStream
Some attributes may even be added in the following manner:
#[proc_macro_derive(HelperAttr, attributes(helper))]
pub fn my_derive_proc_macro(item: TokenStream) -> TokenStream
#[proc_macro]
pub fn my_function_proc_macro(item: TokenStream) -> TokenStream
Note that even though function like and derive procedural macros share the same function prototype, it is not possible to annotate a function as both.
Macros are expanded from the outermost macro to the innermost one (lazily). So in the following situation:
#[alpha]
#[beta]
pub fn order() -> i32 {
42
}
alpha
will see the beta
attribute but not itself, while beta
won't see alpha
.
Multiple derive macros in the same directive such as in the following snippet will be applied from left to right:
#[derive(Gamma, Iota, Mu, Gamma)]
union TUnion {
toto: usize,
tata: f32,
}
As they are part of the same group, Iota
will not see any call to any Mu
nor Gamma
. Gamma
will be called twice.
Macro input shall remain valid rust tokens, it may not be valid rust code but it shall still be lexable. If we try to dump the input with the following macro well's see that tokens are not String but rather seemingly complex enumerations:
#[proc_macro_attribute]
pub fn show_content_types(_attr: TokenStream, item: TokenStream) -> TokenStream {
println!("{:?}", item);
item
}
TokenStream [Ident { ident: "pub", span: #0 bytes(387..390) }, Ident { ident: "fn", span: #0 bytes(391..393) }, Ident { ident: "example", span: #0 bytes(394..401) }, Group { delimiter: Parenthesis, stream: TokenStream [], span: #0 bytes(401..403) }, Group { delimiter: Brace, stream: TokenStream [Ident { ident: "let", span: #0 bytes(410..413) }, Ident { ident: "a", span: #0 bytes(414..415) }, Punct { ch: '=', spacing: Alone, span: #0 bytes(416..417) }, Literal { kind: Float, symbol: "3.14", suffix: Some("f64"), span: #0 bytes(418..425) }, Punct { ch: ';', spacing: Alone, span: #0 bytes(425..426) }], span: #0 bytes(404..428) }]
Those types are defined in the proc_macro
crate.
-
TokenTree
- Tagged union of aPunct
,Ident
,Group
orLiteral
(see below). -
Punct
- Single punctuation characters like;
,<
,=
. May form "complex" punct (eg.<<
==<
+<
). -
Ident
- Identifiers such as variable names,true
,false
,_
and reserved keywords (pub
.as
,async
...) -
Group
- Container for aTokenStream
that may be enclosed with delimiters ((
,{
,[
). A group with no delimiter is valid. -
TokenStream
- vector like structure containing multipleTokenTree
.
Users are interacting with those types and their functions directly, this means we cannot break this API, it shall stay the same as rustc as we want any valid code accepted by rustc also accepted by our compiler. There is a major problem though, this crate is closely related to rustc. Some internal types in this crate are tied to rustc's implementation.
We cannot bypass this situation as this crate is not only used by the user's procedural macro, but also by rustc. The latter uses it for the definition of the various types that can be found in the API.
Since we cannot use the proc_macro
crate directly with gccrs, we need to create our own proc_macro
crate so the compiler and the user's procedural macro can retrieve the types definition and their associated functions.
The user's procedural macro will be written in rust, our proc_macro
crate shall thus expose Rust function and types. But our compiler is written in C++, and can therefore only understand C/C++ types and functions. That's why we've created a "compatibility layer" through FFI.
The user's procedural macro will be statically linked against a proc_macro
library written in rust which is itself linked against an internal proc_macro
library which could be used by the compiler.
flowchart LR
U[my user procedural macro] -->|linked against| R[rust interface]
R -->|linked against| C[libproc_macro cpp]
G[gccrs] -->|linked against| C
Right now there is a C++ library named libproc_macro
that should be renamed to something such as libproc_macro_internal
, whilst the rust directory inside shall lives on its own under the libproc_macro
denomination.
Currently there is some unimplemented functions in libproc_macro
to convert a given string to a TokenStream
type as defined in the proc_macro
crate because they need to lex/parse the string. It will be required to split the lexer/parser from gccrs and put it as it's own module so we could implement those functions.
As so, the final organization might look more like this:
flowchart LR
U([my user procedural macro]) -->|linked against| R[rust interface]
R -->|linked against| C[libproc_macro cpp]
G[gccrs] -->|linked against| C
G -->|dynamically load| U
- The compiler interacts with a user procedural macro through calls to
dlopen
anddlsym
. - The user's procedural macro interacts with
proc_macro
types through rust calls to thelibproc_macro
library. - The
proc_macro
library, acting as an interface interacts with the "internal" procedural macro library through FFI. - The internal
proc_macro
library interacts with the parser/lexer through C++ function calls.
sequenceDiagram
GCCRS->>myprocmacro: Send a tokenstream
Note right of GCCRS: Transfer using dlopen
myprocmacro-->>GCCRS: Give back a tokenstream
loop CodeExpansion
myprocmacro->>myprocmacro: Process tokenstream and construct output tokenstream
libproc_macro->>myprocmacro: Provides tokenstream and other rust types
end
loop TokenConversion
GCCRS->>GCCRS: Convert tokens to libproc_macro types back and forth
libproc_macro_internal->>GCCRS: Provides cpp types for dlopen mechanism
end
libproc_macro->>libproc_macro_internal: Request allocations
libproc_macro_internal-->>libproc_macro: Return allocated type