One-liners for getting iterators over the expressions contained in the various types used in syn
What I wanted to achieve
In a recent project, mplusfonts — which is an adaptation of a TrueType font of the same name for use with a no_std
graphics library — one of the things I wanted was to enable users of the #[strings::emit] helper attribute to apply the attribute to the statement where the result of a given expression is assigned to a variable; otherwise, users would have had to apply it to the expression itself. My goal was to allow for the parent of any expression in general to carry the attribute as shown in the example below.
// Before
let bitmap_font =
mplus!;
// After
let bitmap_font = mplus!;
The expression in this example is the invocation of the mplus! function-like macro; this is the expression that we will want to operate on when expanding the #[strings] macro.
How I got there
The syn crate offers a triplet of interfaces for traversing and manipulating the syntax tree:
- Functions in the fold module receive ownership of the node and are expected to return a value of the same type.
- Functions in the visit module get to borrow the node and inspect its value without modifying it.
- Functions in the visit_mut module get to borrow the node and can also mutate its fields.
The third module contains the trait that MacroVisitor implements.
https://github.com/immersum/mplusfonts/blob/v0.2.0/macros/src/strings/visitor/mac.rs
Although I could have reused the functions from the module that the trait methods call by default, I wanted to implement a solution that involved writing call chains for gathering child expressions from the various types of nodes; these would be run when a node was found that had been marked with the helper attribute.
While iterating over the expressions that we got, which is the subject of this blog post, we can filter the result set as needed.
for expr in exprs
Options can be turned into iterators
As we look through the data structures of the node types, we can see the various field types that contain expressions:
- Some field types are simple expressions, others are the same with indirection:
Box<Expr>
- Some field types have a cardinality of 1 element, others contain 0..1 elements wrapped in an option:
Option<Expr>
- Options may also be combined with indirection:
Option<Box<Expr>>
- There are field types that contain n elements:
Punctuated<Expr, Comma>
- Other field types include a few that are complex ones:
Option<(Eq, Expr)>
The node types that interest us will have one or more of such fields. The most interesting part of the task was determining the transformation logic to get an impl Iterator<Item = &mut Expr>
result for each node type when we have a &mut node
argument.
There are multiple ways to get an iterator over a single value for example. A slice literal is more compact in code size than a function call when an impl IntoIterator<Item = &mut Expr>
is sufficient, such as in case of a for
loop. It can also be extended to two or more elements. The iter::once function call is more readable on the other hand.
// Solution #1 - May also be written without .into_iter()
let exprs = .into_iter;
// Solution #2
let exprs = once;
A third solution would be to wrap the expression in an option: Some(expr).into_iter()
This works because options can also be turned into iterators. While we are on the subject of options, it is worth mentioning that an optional value can also be converted to a slice that is either empty or has one element. Having a slice turned into an iterator is the preferred solution in case of a for
loop; otherwise, clippy would give us a warning - for_loops_over_fallibles.
List of types that carry attributes and contain expressions
1. Types representing variables and constants
The most common use case of the attribute is going to be in the context of some sort of initialization, so that is where we will first have a look at gathering expressions.
1.1. Local let bindings
- Local - Contains 0..1 initializations, which in turn contains 1 expression after the
=
token, then 0..1 expressions after an optionalelse
keyword (the diverge expression).
1.2. Static initializations
- ItemStatic - Contains 1 constant expression after the
=
token.
1.3. Constant declarations
- ItemConst - Contains 1 constant expression after the
=
token. - ImplItemConst - Contains 1 constant expression after the
=
token (inside animpl
block). - TraitItemConst - Contains 0..1 constant expressions after an optional
=
token (the default constant expression, which may be omitted in a trait definition).
2. Types representing type definitions
There are a few places in Rust where constant expressions can be parts of more complex syntax tree nodes.
2.1. Const generic parameters
- ConstParam - Contains 0..1 constant expressions after an optional
=
token (the default constant expression, which may be omitted).
2.2. Enum definitions
- ItemEnum - Contains n variants.
- Variant - Each variant contains 0..1 constant expressions after an optional
=
token (the discriminant of the variant, which may have an explicit value set).
3. Types representing expressions
The rest are the types of syntax tree nodes listed here are the ones that represent expressions and have fields with types that also represent expressions.
3.1. Slice and tuple expressions
- ExprArray - Contains n expressions inside the
[]
tokens (the elems). - ExprTuple - Contains n expressions inside the
()
tokens (the elems).
3.2. Struct expressions
- ExprStruct - Contains n field-value pairs, followed by 0..1 expressions after an optional
..
token (the base struct, which provides the rest of the values). - FieldValue - Each field-value pair contains 1 expression after the
:
token.
3.3. If and while expressions
- ExprIf - Contains 1 expression after the
if
keyword (the cond expression), which is first followed by a block (the then_branch, which is itself not an expression), then another 0..1 expressions after an optionalelse
keyword (the else_branch, which can be a block expression, but it can also be a different type of expression). - ExprWhile - Contains 1 expression after the
while
keyword (the cond expression), which is followed by a block (the body, which is itself not an expression). - ExprLet - Contains 1 expression after the
=
token, called the scrutinee.
3.4. For loop expressions
- ExprForLoop - Contains 1 expression after the
in
keyword (an expression that is either itself an iterator, or it can be turned into an iterator).
3.5. Range expressions
- ExprRange - Contains 0..1 expressions before and 0..1 expressions after the
..
or..=
token (the start and end expressions). - ExprRepeat - Contains 1 expression before and 1 expression after the
;
token.
3.6. Match expressions
- ExprMatch - Contains 1 expression after the
match
keyword, called the scrutinee, followed by n arms. - Arm - Each
match
arm contains 0..1 expressions after an optionalif
keyword (the guard expression), followed by 1 expression after the=>
token (the body expression, which can be a block, but it can also be a different type of expression).
3.7. Break and return expressions
- ExprBreak - Contains 0..1 expressions after the
break
keyword and an optional label. - ExprReturn - Contains 0..1 expressions after the
return
keyword. - ExprYield - Contains 0..1 expressions after the
yield
keyword.
3.8. Closure expressions
- ExprClosure - Contains 1 expression after the
||
tokens (the body expression, which can be a block, but it can also be a different type of expression).
3.9. Cast expressions
- ExprCast - Contains 1 expression before the
as
keyword.
3.10. Referencing and address-of operations
- ExprReference - Contains 1 expression after the
&
token and an optionalmut
keyword. - ExprRawAddr - Contains 1 expression.
3.11. Try and await expressions
- ExprTry - Contains 1 expression before the
?
token. - ExprAwait - Contains 1 expression before the
.
token and theawait
keyword (thebase
expression).
3.12. Field access expressions
- ExprField - Contains 1 expression before the
.
token (thebase
expression).
3.13. Function and method call expressions
- ExprCall - Contains 1 expression before the
()
tokens (thefunc
expression), followed by n expressions inside the()
tokens (the args). - ExprMethodCall - Contains 1 expression before the
.
token and the method identifier (the receiver, which will become theself
argument of the method), followed by n expressions inside the()
tokens (the args).
3.14. Index expressions
- ExprIndex - Contains 1 expression before the
[]
tokens, followed by 1 expression inside the[]
tokens (the index).
3.15. Unary and binary operations
- ExprUnary - Contains 1 expression after the unary operator, called the operand.
- ExprBinary - Contains 1 expression before and 1 expression after the binary operator, called the left and right operands.
3.16. Assignment expressions
- ExprAssign - Contains 1 expression before and 1 expression after the
=
token.
3.17. Parenthesized and non-parenthesized group expressions
3.18. Blocks, try blocks, bodies of loops, etc.
There are syntax tree nodes that can have attributes applied, but these are expressions that contain a block instead of a child expression (like the then_branch of if
expressions).
- ExprBlock, ExprConst, ExprUnsafe, ExprTryBlock, ExprAsync, ExprLoop - Contain 0 expressions.
3.19. Macro invocations
- ExprMacro - Contains a stream of subtrees with arbitrary nodes that serve as input to its invocation.
However, since this blog post describes a scenario where the expressions that we got to iterate over were going to be narrowed down to macro invocations in the first place, finding expressions nested in macro invocations would be something for another post.