Settings

Theme

Faster alternatives to Go's regexp library

nightfall.ai

5 points by visaals 4 years ago · 4 comments

Reader

visaalsOP 4 years ago

We started off using Go's regexp library for our regex needs, but it wasn't made for our unique use case of running 40+ regular expressions over terabytes of data.

Here's our experience benchmarking Go's regexp against Intel Hyperscan and Google re2, hope it's useful!

mikece 4 years ago

Not questioning anyone's use of Go but wouldn't a library written in C or C++ be able to do regex parsing the fastest?

  • visaalsOP 4 years ago

    Furthermore, Russ Cox (author of Go’s regexp library and significant contributor to re2) mentioned this interesting tradeoff.

    > Second, Go is using a different algorithm for regexp matching than the C implementations in those other languages. The algorithm Go uses guarantees to complete in time that is linear in the length of the input. The algorithm that Ruby/Python/etc are using can take time exponential in the length of the input, although on trivial cases it typically runs quite fast. In order to guarantee the linear time bound, Go’s algorithm’s best case speed a little slower than the optimistic Ruby/Python/etc algorithm. On the other hand, there are inputs for which Go will return quickly and Ruby/Python/etc need more time than is left before the heat death of the universe. It’s a decent tradeoff.

    1. https://awmanoj.github.io/tech/2016/09/08/on-slow-regular-ex...

  • visaalsOP 4 years ago

    Yeah, the language of the regex implementation and the API were the two biggest factors. Go’s regexp package only allowed you to run 1 regex expression at a time while hyperscan and re2 allowed you to pass in a set of expressions to run all at once.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection