Semgrep: Writing quick rules to verify ideas

blog.deesee.xyz

62 points by adrianomartins 3 years ago · 19 comments

Reader

craigds 3 years ago

I use semgrep for semantic search (and replace, sometimes).

Their docs and website try very hard to suggest you should use it for some kind of CI process, but so far I haven't found any need to do so. I can maybe see it being useful in a pre-commit hook.

It's VERY handy for semantic searches though - in situations where ripgrep would be useless due to multi-line matches.

I set up this alias to make it a bit less verbose for Python patterns:

    pygrep () {
        pat="$1"
        shift
        filez="$*"
        bash -xc "semgrep --lang=python --pattern '$pat' $filez"
    }

Usage is something like:

    pygrep 'myfunc(..., needle_arg=..., ...)'

burntsushi 3 years ago

Note that ripgrep can do multi-line searches with the -U flag.
Not that this detracts from your main point. Semgrep is much smarter than ripgrep and goes well beyond multi line searches.
I just wanted to clarify the small thing.
- craigds 3 years ago
  
  thanks for ripgrep!
underyx 3 years ago
Heya, Semgrep maintainer here. Just wanted to ask you about an idea I had before, how would you feel about specifying the language parameter in the binary name, making the invocation look like this?
```
    semgrep.py search 'myfunc(..., needle_arg=..., ...)'
```
And then the other subcommands would remain
```
    semgrep scan --config auto
```
to scan with all recommended rules and
```
    semgrep ci
```
to scan in CI jobs.
- leipert 3 years ago
  
  I feel like the „semgrep.py“ idea is not that good, because someone could legitimately have a semgrep.py or semgrep.js or similar file which wraps semgrep.
  Edit: thanks for maintaining semgrep, started using it heavily in day job and the team started writing Frontends for it.
  - underyx 3 years ago
    
    If someone had such a wrapper, I'd expect if it's globally available in $PATH then it'd have a more descriptive name, and if it's not in $PATH, then you'd likely run it as `python semgrep.py` or `./semgrep.py`. Does that sound right to you?
    
    neura 3 years ago
    
    Why not `semgrep-py`?
    Though, as I tried to type that, I typed semgrep.py twice. The dot name really seems like a file extension, though. I'm torn.
    Also, first time trying the tool and I love it!
    
    underyx 3 years ago
    
    Yeah, I don't really have a good reason, it just feels like the wrong call :/
    Maybe it's that the dot makes it feel like 'variants' of 'semgrep' (even if for the wrong reason) but semgrep-py feels like an entirely distinct binary from semgrep or any other variants.
- O_H_E 3 years ago
  
  >> Their docs and website try very hard to suggest you should use it for some kind of CI process...
  Just a piece of feedback for the record: I have been stuck in exactly the same place the few times I was interested in trying out a ripgrep alternative that understood semantics, but didn't have such an urgent need to actually understand how to get things going.
  - underyx 3 years ago
    
    Thanks! Could you let me know what you'd change on our Getting Started[0] page to explain the CLI usage better?
    [0]: https://semgrep.dev/docs/getting-started/
    
    craigds 3 years ago
    
    I'd suggest adding at least one example of using `semgrep --pattern <pattern>`. That seems pretty well hidden in the docs, and for me it's the most useful option.
    I wasn't trying to search for things that other people thought were interesting; I wanted a tool that would search for some pattern I thought of - and preferably without having to write a yaml file.
    
    underyx 3 years ago
    
    Thanks a lot! I opened a pull request with your suggestion here: https://github.com/returntocorp/semgrep-docs/pull/744
    Edit: It's approved but that's just our CEO :D I'll wait for an approval from our tech writers who are in non-US time zones, so your suggestion will likely land tomorrow. Thank you!
- lbhdc 3 years ago
  
  I really wish it could infer the language from the file type. One of the things that prevents me from reaching for semgrep as often as I want to is the complexity, and having it infer language from filetype would be nice.
  - underyx 3 years ago
    
    Good idea! I opened an issue here: https://github.com/returntocorp/semgrep/issues/6331
    
    lbhdc 3 years ago
    
    I love it, thank you!
iib 3 years ago

Don't you have to shift the arguments, so that `$1` does not also end in `filez`?
- craigds 3 years ago
  
  There is a `shift` in the function

koyanisqatsi 3 years ago

I was looking for something like this the other day but then ended up just using RubyVM::AbstractSyntaxTree.parse_file and then rolled my own visitor on top of the AST. It's cool what they can do here but I think any language that exposes its AST is amenable to this kind of analysis, you just have to write some code to do it. The main bottleneck in my experience is just being familiar with the AST structure and how it maps to source syntax. It's cool that they have abstracted a lot of the commonality among several languages, definitely gonna look into this next time I need semantic code search.

renewiltord 3 years ago

Very cool. Thank you for writing this!

Settings

Semgrep: Writing quick rules to verify ideas

Keyboard Shortcuts