Semgrep: Writing quick rules to verify ideas
blog.deesee.xyzI use semgrep for semantic search (and replace, sometimes).
Their docs and website try very hard to suggest you should use it for some kind of CI process, but so far I haven't found any need to do so. I can maybe see it being useful in a pre-commit hook.
It's VERY handy for semantic searches though - in situations where ripgrep would be useless due to multi-line matches.
I set up this alias to make it a bit less verbose for Python patterns:
pygrep () {
pat="$1"
shift
filez="$*"
bash -xc "semgrep --lang=python --pattern '$pat' $filez"
}
Usage is something like: pygrep 'myfunc(..., needle_arg=..., ...)'Note that ripgrep can do multi-line searches with the -U flag.
Not that this detracts from your main point. Semgrep is much smarter than ripgrep and goes well beyond multi line searches.
I just wanted to clarify the small thing.
thanks for ripgrep!
Heya, Semgrep maintainer here. Just wanted to ask you about an idea I had before, how would you feel about specifying the language parameter in the binary name, making the invocation look like this?
And then the other subcommands would remainsemgrep.py search 'myfunc(..., needle_arg=..., ...)'
to scan with all recommended rules andsemgrep scan --config auto
to scan in CI jobs.semgrep ciI feel like the „semgrep.py“ idea is not that good, because someone could legitimately have a semgrep.py or semgrep.js or similar file which wraps semgrep.
Edit: thanks for maintaining semgrep, started using it heavily in day job and the team started writing Frontends for it.
If someone had such a wrapper, I'd expect if it's globally available in $PATH then it'd have a more descriptive name, and if it's not in $PATH, then you'd likely run it as `python semgrep.py` or `./semgrep.py`. Does that sound right to you?
Why not `semgrep-py`?
Though, as I tried to type that, I typed semgrep.py twice. The dot name really seems like a file extension, though. I'm torn.
Also, first time trying the tool and I love it!
Yeah, I don't really have a good reason, it just feels like the wrong call :/
Maybe it's that the dot makes it feel like 'variants' of 'semgrep' (even if for the wrong reason) but semgrep-py feels like an entirely distinct binary from semgrep or any other variants.
>> Their docs and website try very hard to suggest you should use it for some kind of CI process...
Just a piece of feedback for the record: I have been stuck in exactly the same place the few times I was interested in trying out a ripgrep alternative that understood semantics, but didn't have such an urgent need to actually understand how to get things going.
Thanks! Could you let me know what you'd change on our Getting Started[0] page to explain the CLI usage better?
I'd suggest adding at least one example of using `semgrep --pattern <pattern>`. That seems pretty well hidden in the docs, and for me it's the most useful option.
I wasn't trying to search for things that other people thought were interesting; I wanted a tool that would search for some pattern I thought of - and preferably without having to write a yaml file.
Thanks a lot! I opened a pull request with your suggestion here: https://github.com/returntocorp/semgrep-docs/pull/744
Edit: It's approved but that's just our CEO :D I'll wait for an approval from our tech writers who are in non-US time zones, so your suggestion will likely land tomorrow. Thank you!
I really wish it could infer the language from the file type. One of the things that prevents me from reaching for semgrep as often as I want to is the complexity, and having it infer language from filetype would be nice.
Good idea! I opened an issue here: https://github.com/returntocorp/semgrep/issues/6331
I love it, thank you!
Don't you have to shift the arguments, so that `$1` does not also end in `filez`?
There is a `shift` in the function
I was looking for something like this the other day but then ended up just using RubyVM::AbstractSyntaxTree.parse_file and then rolled my own visitor on top of the AST. It's cool what they can do here but I think any language that exposes its AST is amenable to this kind of analysis, you just have to write some code to do it. The main bottleneck in my experience is just being familiar with the AST structure and how it maps to source syntax. It's cool that they have abstracted a lot of the commonality among several languages, definitely gonna look into this next time I need semantic code search.
Very cool. Thank you for writing this!