Settings

Theme

Show HN: Codemodder – A new codemod library for Java and Python

codemodder.io

37 points by nahsra 2 years ago · 8 comments · 3 min read

Reader

Hi HN, I’m here to show you a new codemod library. In case you’re not familiar with the term "codemod", here’s how it was originally defined AFAICT:

> Codemod is a tool/library to assist you with large-scale codebase refactors

Codemods are awesome, but I felt they were far from their potential, and so I’m very proud to show you all an early version of a codemod library we’ve built called Codemodder (https://codemodder.io) that we think moves the "field" forward. Codemodder supports both Python and Java (https://github.com/pixee/codemodder-python and https://github.com/pixee/codemodder-java). The license is AGPL, please don’t kill me.

Primarily, what makes Codemodder different is our design philosophy. Instead of trying to write a new library for both finding code and changing code, which is what traditional codemod libraries do, we aim to provide an easy-to-use orchestration library that helps connect idiomatic tools for querying source code and idiomatic tools for mutating source code.

So, if you love your current linter, Semgrep, Sonar, or PMD, CodeQL or whatever for querying source code – use them! If you love JavaParser or libCST for changing source code – use them! We’ll provide you with all the glue and make building, testing, packaging and orchestrating them easy.

Here are the problems with existing codemod libraries as they exist today, and how Codemodder solves them.

1. They’re not expressive enough. They tend to offer barebones APIs for querying code. There’s simply no way for these libraries to compete with purpose-built static analysis tools for querying code, so we should use them instead.

2. They produce changes without any context. Understanding why a code change is made is important. If the change was obvious to the developer receiving the code change, they probably wouldn’t have made the mistake in the first place! Storytelling is everything, and so we guide you towards making changes that are more likely to be merged.

3. They don’t handle injecting dependencies well. I have to say we’re not great at this yet either, but we have some of the basics and will invest more.

4. Most apps involve multiple languages, but all of today’s codemod libraries are for one language, so they are hard to orchestrate for a single project. We’ve put a lot of work into making sure these libraries are aligned with open source API contracts and formats (https://github.com/pixee/codemodder-specs) so they can be orchestrated similarly by downstream automation.

The idea is "don’t write another PR comment saying the same thing, write a codemod to just make the change automatically for you every time". We hope you like it, and are excited to get any feedback you might have!

westurner 2 years ago

How does libCST compare to e.g. pyCQA/redbaron? What about for EA Evolutionary Algorithms; does it preserve comments, or update docstrings and type annotations in mutating the code under test?

Is it necessary to run `black` (and `precommit run --all-files`) to format the code after mutating it?

Instagram/LibCST: https://github.com/Instagram/LibCST

PyCQA/redbaron: https://github.com/PyCQA/redbaron

E.g. PyCQA/bandit does static analysis for security issues in Python code: https://github.com/PyCQA/bandit

https://news.ycombinator.com/item?id=38677294

https://news.ycombinator.com/item?id=24511280 ... https://analysis-tools.dev/tools?languages=python

  • drdavella 2 years ago

    Hi! Great questions. I'm the lead maintainer of the Python version of the Codemodder framework so I'll do my best to answer.

    > How does libCST compare to e.g. pyCQA/redbaron?

    LibCST is similar to redbaron in the sense that it does preserve comments and whitespace. The "CST" in LibCST refers to "concrete syntax tree", which preserves comments and whitespace, as opposed to an "abstract syntax tree" or "AST", which does not. Our goal is to make the absolute minimal changes required to harden and improve code, and messing with whitespace would be counter to that goal. It's worth noting that redbaron no longer appears to be maintained and the most recent version of Python that it supported was 3.7 which is now itself EOL.

    > What about for EA Evolutionary Algorithms

    Can you elaborate? I am familiar with the concept of evolutionary algorithms but I'm not sure I understand what you mean in this context.

    > does it preserve comments, or update docstrings and type annotations in mutating the code under test?

    Codemodder does preserve comments. Currently none of our codemods update docstrings; I'm not sure we currently have any cases where that would make sense. We do make an effort to update type annotations where appropriate.

    > Is it necessary to run `black` (and `precommit run --all-files`) to format the code after mutating it?

    Yes, it is currently necessary to run `black` and `precommit` if you're using it on your project. While `black` is incredibly popular, we also can't assume that it's being used on any given project. Running `black` would cause each updated file to be completely reformatted which would lead to very noisy and difficult-to-review changes. I would like to explore better solutions to this issue going forward.

    I am familiar with `bandit`. It's a fairly simple security linter and is useful for finding some common issues. It's also pretty prone to false positives and noisy findings. Not every problem identified by `bandit` is something that can be automatically fixed; for example I can't replace a hard-coded password without making a lot of (breaking) assumptions about the structure of your application and the manner in which it is deployed.

    I'd love to get your feedback on Python Codemods! Give us a star on GitHub and feel free to open an issue or PR: https://github.com/pixee/codemodder-python

morgante 2 years ago

Interesting approach of basically providing a meta-layer on top of existing tools.

Do you have an example of how you inject context into the codemods? The approach we've taken at Grit is two-fold:

1. When something must be addressed (ex. `todo`), we have functions that wrap messages into the source code to ensure anyone sees the info until it's fixed. We pick up these messages automatically on our SaaS platform.

2. For non-blocking comments, we have a `log` function that any query can call to surface info into the result stream on the CLI + pull requests without it ending up in the final PR.

>4. all of today’s codemod libraries are for one language, so they are hard to orchestrate for a single project.

This isn't entirely true! Grit, my project, was built to be multi-language from the start: https://docs.grit.io/language/overview

[0] https://docs.grit.io/language/functions#todo

  • nahsraOP 2 years ago

    Grit looks cool! My apologies for the omission, I was unaware of it. I could have anchored too hard to the word "codemod" in my searches. Your tool looks awesome!

    > Do you have an example of how you inject context into the codemods?

    When you say "context", I want to make sure we're talking about the same thing, and the question makes me think we're not there yet. We're basically saying that storytelling about the changes is very important, so we bake invariance into the APIs of codemods themselves, so codemod authors are forced to provide descriptions, reasons, justification -- whatever -- at the key points.

blackfur 2 years ago

Have you heard about Mixin? What advantages could Codemodder have over SpongePowered Mixins?

  • gilday 2 years ago

    Codemodder and SpongePowered Mixin cater to different scenarios. Codemodder is ideal for transforming source code you own, based on specific patterns. Changing source code allows the changes to be tracked, reviewed, and analyzed using standard tools like compilers and static analysis. It's great for large-scale codebase refactoring.

    Contrastingly, SpongePowered Mixin uses Java bytecode manipulation, to transform the bytecode of a specific type. Bytecode manipulation comes with added risks and complexity, so this method is typically reserved for when you need to change the behavior of some external library or framework type. For example, Mixin is useful in Minecraft modding, because it allows modders to change the behavior of externally defined Minecraft types.

    In essence, choose Codemodder for large-scale refactors to your source code, and Mixin to modify the bytecode of external Java types.

    • blackfur 2 years ago

      Ah seems like I greatly misunderstood the purpose of CodeModder. Thanks for the clarification.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection