Julia Macros for Beginners
jkrumbiegel.comAs always, I'm very glad to see that structural, Common Lisp-style macro systems with the whole language available for macro construction, have been successfully adopted in other languages to the point where it's possible to explain them without a single mention of Lisp in the article, or even better - where a mention of Lisp anywhere else except for the very beginning would make the article worse by making a unnecessary detour.
pg's article on the topic, "What Made Lisp Different", [0] has aged poorly, and points 8 and 9 it makes (a notation for code using trees of symbols and the whole language always available) are no longer Lisp-specific. The final point, about "inventing a new dialect of Lisp", doesn't hold true either - as seen here, Julia is doing just fine not claiming to be another dialect of Lisp, even though many sources mention directly that it's Lisp-inspired.
Congrats to Julia people for the macro system and to the author for the article!
One could argue that Julia is just Lisp with a syntax that appeals more to popular taste.
There is also a secret option to get into a lisp repl in Julia "julia --lisp".
> There is also a secret option to get into a lisp repl in Julia "julia --lisp".
Wtf....... what? I just tried it, it's true. Is this some easter egg?
Julia's parser is written in Scheme using femtolisp. Look for *.scm files and the 'flisp' directory in the repo[1].
I will not look for such files, I'd rather forget this ever happened.
yeah it doesn't take much eye squinting to see lisp in julia. given these similarities i wonder if julia users could start to appreciate the s-expression syntax. i come from matlab then python background and i have come to really enjoy the s-expression syntax. modern IDE tools have made s-expression code as readable as pythonic pseudo-code-like syntax while affording the programmer unrivaled editing power
As it happens, in addition to the femptolisp in the Julia parser, there is actually a secret s-expression syntax for Julia itself. There's no built in REPL mode for it, but you can hack one in about a dozen lines: https://gist.github.com/brenhinkeller/44051118c2f9d18b26dc76...
nice! one annoying nit pick for me though is using commas as data separators. when you need to input data by hand into a multi dim array this can get annoying very quickly
There's https://github.com/swadey/LispSyntax.jl for that.
It's true; I think this syntax was probably made more for reading than writing since the main place it appears in the base language is just `Meta.show_sexpr`, but it's still interesting to play around with, and parsing it has some fun properties like that you can use Julia's standard syntax as effectively a preprocessor syntax for the s-expression syntax.
Sure, that's always an option - same as Dylan, in a way!
Julia does take inspiration from Dylan and Scheme(the parser is even written in it) so it makes sense I guess
Macros were part of the "holy shit" moment for me for Lisp, in particular the Common Lisp Object System. I hadn't fully realized that it was possible to add a whole new paradigm to a language as a library [1], and moreover a particularly nice implementation of that paradigm.
After that, I realized that macros aren't always something that needs to be avoided; in the right hands they're immensely powerful.
I've only played a little with Julia macros, but it seems like they learned a lot of Lisps lessons, so I support it wholly.
[1] I wasn't aware of how Objective C was built at the time.
The frustration of Julia macros for me was never knowing what AST would be produced for a given expression. This is a bit more manageable if you have a typed AST (e.g. OCaml but ppxes have other issues) or an obvious one (e.g. lisp). I like the way rust handles it where the macros operate on a tree of non-delimiter leaves and [delimiter, subtree list, delimiter] nodes which can allow for figuring out what the input to a macro will be more easily and for more varied macro syntax. Other languages that want a full AST before macros force the macro input to be a bit more AST-like, e.g. the Julia parser picks operator precedence and OCaml won’t let you use _ as an identifier.
Maybe it is better now but when I looked at macros ~5 years ago some language update changed the ast produced by the parser and I basically gave up.
I like that Julia offers some macro-like techniques that replace a lot of the cases where one might use a macro for performance reasons.
Tip: the first move when trying to write a macro is doing `Meta.@dump` on examples of argument expressions you want your macro to consume and produce. Then write code that transforms the inputs to the outputs.
Right but it wasn’t obvious to me (at the time) what a change to the source code would do to the ast, so it wasn’t easy to know all the cases to handle, especially with quasiquotation (I think double-backtick style programming was basically impossible).
For example, maybe you want to handle something that looks like:
In lisp syntax (and recall that is what the Julia ast is: everything is a head and then arguments) it might look like:foo ~~> bar
But if you change to e.g.(~~> foo bar) ; or (op ~~> foo bar)
You might getfoo ~~> bar + 5
Or(+ (~~> foo bar) 5)
I don’t remember what you got or which cases were tricky, only that I could never guess what the output of dump would be.(~~> foo (progn (+ bar 5)))
> ~5 years ago
Was the language even stable then?
5 years ago was right before 1.0, so a ton of stuff was broken since it was the last chance.
This is an excellent, clear introduction to a topic that’s not easy to explain to beginners.
Julia macros are pretty legit. Right up there with Rust macros.
never understood why this was better/ how it was different from regular functions. just seems like bugs/vulns waiting to happen
Being able to operate on expressions before they reach the compiler is very handy.
Consider how ergonomic testing is thanks to macros: https://docs.julialang.org/en/v1/stdlib/Test/
Here's an example of passing quasi-json to a plotting function: https://www.queryverse.org/VegaLite.jl/stable/userguide/vlpl... . This lets you essentially transliterate a VegaLite spec into Julia without needing to translate it into Julia.
Finally, macros that operator on dataframes let you write code that looks kind of like SQL, and is much more pleasant than working with functions: https://dataframes.juliadata.org/stable/man/querying_framewo...
It is not necessarily better in all cases and should not be overused: https://youtu.be/mSgXWpvQEHE?t=579
However, it is useful to provide a nicer syntax and DSLs.
Some examples: https://stackoverflow.com/questions/58137512/why-use-macros-... https://www.juliafordatascience.com/animations-with-plots-jl... https://gist.github.com/MikeInnes/8299575
Some macro systems can create variables in a loop for you. You could make this macro,
That's somewhat impossible with functions. The closest you get is either an array/dict with only runtime error checking, or an external codegen program.(define-all i 5 0) ;; creates i1 i2 i3 i4 i5 initialized to 0I wrote a post [0] about how to do this in Racket. The macro generates ORM code based given a SQLite DB. Aka the compiler queries SQLite and generates table-column functions automatically.
More potential benefits are: Better static error messages (can implement a type system using macros, example here[1]), and controlling execution order (can add lazy computation semantics).
[0]: http://tech.perpetua.io/2022/01/generating-sqlite-bindings-w...
[1]: https://gist.github.com/srcreigh/f341b2adaa0fe37c241fdf15f37...
no one is proposing macros as a paradigm of programming. they simply give programmers expressive powers not afforded by use of regular functions. in a sense you can compare macros to c++ templating. know what you are doing and use sparingly
There are R meta programming techniques called non standard evaluation. It has many similarities to the macro system in Julia
That's true. However, I believe that many R programmers don't know when non-standard evaluation happens or what it is exactly. Functions with or without it cannot be told apart just by looking at the syntax.
While NSE enables the dplyr syntax that many people enjoy, for me it's too magic and I have trouble reasoning about variable names in other people's code.
What does dplyr syntax look like?
Let's say you have a data frame
and you want to use a dplyr verb to modify itdf = tibble(a = c(1, 2))
the `a` in the above expression refers to the column in `df`, but this means it's hard to reference a variable in the outer scope named `a`. Furthermore, if you have a string referring to the column name `"a"`, you can't simply writemutate(df, b = a + 1)
Contrast this with DataFramesMeta.jl, which is a dply-like library for Julia, written with macros.mutate(df, b = a_var + 1)
Because of the use of Symbols, there is no ambiguity about scopes. To work with a variable referring to column `a` you can writedf = DataFrame(a = [1, 2]) @transform df :b = :a .+ 1
I won't pretend this isn't more complicated or harder to learn. Some of the complexity is due to Julia's high performance limiting non-standard evaluation in subtle ways. But a core strength of Julia's macros is that it's easy to inspect these expressions and understand exactly what's going on, with `@macroexpand` as shown in the blog post.a_str = "a" @transform df :b = $a_str .+ 1DataFramesMeta.jl repo: https://github.com/JuliaData/DataFramesMeta.jl
To reference variables in the outer scope, you would do
And if you have a string (contained in a_var) which identifies a variable you can domutate(df, b = .env$a + 1)
You could argue these feel clumsy, but I wouldn’t say it’s “hard” to do either of these things with dplyr.mutate(df, b = .data[[a_var]] + 1)I don't think it's just about whether it's hard to do, your syntax example looks short enough and one can memorize these two patterns relatively quickly.
However, both patterns are another special case how identifiers are resolved in the expression. Aren't `.env` and `.data` both valid variable and column names? So what happens if I have a column named `.data`?
Another example, which is the reason why we chose the `:column` style to refer to columns in `DataFramesMeta.jl` and `DataFrameMacros.jl`:
What happens if you have the expression `mutate(df, b = log(a))`. Both `log` and `a` are symbols, but `log` is not treated as a column. Maybe that's because it's used in a function-like fashion? Maybe because R looks at the value of `log` and `a` in their scope and sees that `log` is a function an `a` isn't?
In Julia DataFrames, it's totally valid to have a column that stores different functions. With the dplyr like syntax rules it would not be possible to express a function call with a function stored in a column, if the pattern really is that function syntax means a symbol is not looked up in the dataframe anymore.
In Julia DataFrameMacros.jl for example, if you had a column named `:func` you could do `@transform(df, :b = :func(:a))` and it would be clear that `:func` resolves to a column.
This particular example might seem like a niche problem, but it's just one of these tradeoffs that you have to make when overloading syntax with a different meaning. I personally like it if there's a small rule set which is then consistently applied. I'd argue that's not always the case with dplyr.
I hadn't thought of that tradeoff. After testing just now, if you have a column named `.data` or `.env` those constructs work as if there was no such column, and actually in that case `mutate(df, b = .data + 1)` is an error.
Personally I'll happily take not being able to use those as column names if it means I can avoid always typing : before every in-data variable, but your comment gave me a better understanding of why it would be bad for some other person or scenario, perhaps where short term ease-of-use is lower on the list of priorities.
For your second example, it doesn't come up in R because a data frame column cannot be a function. Columns must be vectors (including lists) and you could have a vector where one or all elements are functions, but the column itself cannot not be a function (functions are not vectors), so there's no ambiguity there. To call a function stored in your data frame you'd have to access an element of the column, and any access method, e.g. `[[` or `$` would make the resulting set of characters invalid as the name of an object (without backticks, which would then disambiguate the intent)
Separate from dplyr, in R when you use `(` to call a function it searches only for functions by that name.df <- tibble(x = list(function(x) x + 1)) df %>% mutate(y = x[[1]](3))log <- 3 log(1) # 0 frog <- 3 frog(3) # Error in frog(3) : could not find function "frog" log <- function(x) x^2 log(1) # 1In Julia you could have an `AbstractVector` type also be callable, or more likely a vector of callable objects (and the operation is performed row-wise).
I agree it's unlikely that a user will name their column `.data`. But it certainly saves developer effort from thinking about these issues.
The larger concern, really, is that Julia needs to know which things are columns and which things are variables in an expression at parse time in order to generate fast code for a DataFrame. It needs to do this without inspecting the data frame, since the data frame's contents aren't known at parse time.
One option would be to make all literals columns. But then you run into issues with things like `missing`, which would have to be escaped or not recognized as a column. Its hard to predict all the problems there, and any escaping rules would definitely have to be more complicated than R's. So we require `:` and take the easy way out, which has the added benefit for new users who might get confused about the variable-column distinction.
It would be interesting to profile the 2nd version though. Assuming the non-standard evaluation has performance benefits (which they do in DataFramesMeta.jl), are you eliminating those benefits when you use
.data[[a_var]] ?
It's even better when you have the "." variable which get populated.
But in general yeah, R plays pretty fast and loose with scopes, and lets you capture expressions as arguments and execute them in a different scope from the outside one