Show HN: Luna is a Clojure domain specific language that translates to regex
github.comMy approach was simpler, provide some composability.
;compiled at run time
(re/or #"a" (re-pattern "b"))
=> #"(a|b)a"
;compiled at macro time
(re/or #"a" (re/and #"b"))
=> #"(a|(b))"
https://github.com/coltnz/re-extEDIT: This comment is wrong, I've mistakenly read `(?<=y)x` as `(?=y)x` and wrongly determined the key ":after" is incorrect when writing this comment.
I’m confused. Given a string “foobar”, `(?<=foo)bar` should match “bar”. Or is this implicitly `^(?<=foo)bar$` in Clojure? I used to know the answer but it’s been several years since I’ve used Clojure.
Oops, I've mistakenly thought `(?=foo)bar` when writing that comment. Please disregard the parent comment.
There’s also Regal: https://github.com/lambdaisland/regal
I like to think of DSL's for regex as a good way for people who don't have time to master regex itself, or are beginners, to actually use regex. Been doing programming for about 10 years and only fairly recently took the time to properly learn regex, but could've definitely benefitted from a DSL way before that. Good job Abhinav!
The fact that nobody wants to learn or use regex is a pretty strong indicator that it's not a very good language. I think we've largely settled on it as matter of historical contingency rather than because it has any particular merit. For starters there's no capacity to name expressions in order to combine them at a higher level of abstraction.
I'd love to see something like Rosie take off as an alternative: https://rosie-lang.org/about/
everybody is learning regex, it's part of any standard curriculum, it's in everyday use on unix platforms, rigid implementation without variables or recursion is a regular automaton has a firm place in the computational hierarchy and the benefit of predictable runtime, the markup is just syntactic sugar, and you can of course embed regex in any more powerful programming language that allows variables, like awk, perl, etc. Perl, hardly anyone wants to learn anymore, if that's what you mean.
I'm not totally convinced by these English-like regex languages. Regex itself seems like a reasonable language:
[a-z] means a to z
x* means 0 or more xs
...
The problem is that
a) whitespace is significant, so long regexes start to become unreadable
b) no easy way to nest regexes
Perl 6/Raku seemed to solve this by making whitespace meaningless by default and allowing regexes to be nested into other regexes or entire "grammars".
This. Also, I try to:
- use only those regex features that are cross-platform (cross-engine?)
- avoid `\w` and similar (what is word?) and instead use explicit rules (`[a-zA-Z]+`)
- use [] instead of backslashes when escaping (`[(]` instead of `\(`) - especially useful with double/triple escaping
- never use long regexes (there is always a better way)
I never understood why people are afraid of regular expressions.
That said, there are some performance/security implications wrt. backtracking that one should be aware of [0].
[0] https://javascript.info/regexp-catastrophic-backtracking
> avoid `\w` and similar (what is word?) and instead use explicit rules (`[a-zA-Z]+`)
I don't think that's a good idea. What about Chinese, Japanese, Arabic, ...?
\w and “<“ and “>” are useful in regular expressions for programming source code because you can define them to match the character set of the programming language’s identifiers.
Completely agree. For js, xregexp [1] is a really nice long standing library that brings in these kind of enhancements and makes regular expressions a lot more readable.
My personal irritation is I often want thing like [],() or + as the thing I'm searching for, and then after escaping things get really unreadable.
Janet Lisp deprecated regex by using PEG: https://janet-lang.org/docs/peg.html
Kind of reminds me of parser combinators but not extensible.
Yes!
Parser combinators extended to typed trees, with an efficient syntax, would be a great basis for a programming language. Macros would be a primary application rather than strapped-on as they are now in every language.
Another clue that we're watching a planet form, a fundamental idea is about to coalesce: Combinators for applying tactics in a theorem proving language such as Lean bear an uncanny resemblance to parser combinators.
Today, most programmers view verified code as impractical. Huh? Is getting hacked, or the wrong answer, practical? In the future we'll be able to use the same code to write efficient programs, efficient randomized tests, and for program verification. It will feel like targeting different architectures.
The key idea in type-based theorem proving is "propositions as types": One defines a type to state a proposition, then demonstrates an instance of that type to prove the proposition. Existence is boolean (true or false) which is also the coefficient semiring for automata theory in computer science. There, one generalizes to arbitrary coefficient rings, and gets probability models such as hidden Markov chains as part of the same theory. How does one generalize "propositions as types" to arbitrary coefficients? Does this subsume random testing? The parallel here is that one wants a valid instance of a type to prove a theorem, and a probability distribution on valid instances of a (far simpler) type for random testing.
Anyone designing a language in this decade should look to program verification and theorem proving for the deepest influences.
The Hitchhiker's Guide to Logical Verification https://github.com/blanchette/logical_verification_2020/raw/...
Benjamin Pierce: Backtracking Generators for Random Testing https://www.youtube.com/watch?v=dfZ94N0hS4I&t=489s
Homotopy Type Theory https://homotopytypetheory.org/book/
makes me think of Emacs's `rx` macro!
Took me a while to appreciate rx. I like the conciseness of normal regexes, but rx is so much more readable and maintainable! I love the extensibility options and the fact it's just a macro. So, at runtime, there is no difference.
Glancing over the examples of this post, I think I like rx better, due to its lispy syntax.
So, basically, AppleScript for regex.
I think this bumps JWZ's problem counter to 3..
"Don't be snarky."
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
https://news.ycombinator.com/newsguidelines.html
Be respectful. Anyone sharing work is making a contribution, however modest.
Instead of "you're doing it wrong", suggest alternatives. When someone is learning, help them learn more.
When something isn't good, you needn't pretend that it is, but don't be gratuitously negative.