We've been dealing with a persistent problem in SDK generation: when you add custom files to a generated codebase, they get deleted during the next update. Similarly, any custom code added to generated files gets overwritten in subsequent updates.
After building API client SDK generators for dozens of companies, we kept hearing the same frustration: "Your generated SDK is great, but I need to add my own wrapper class here, or a custom error handler there, and they keep getting deleted when you regenerate."
So we built something different: a code generator that treats your custom files and code as first-class citizens. Our generator also maintains edits to the actual generated files, an even greater challenge, but that solution will be detailed in a future post. This article explains our solution that leans heavily on a macro! in the Rust programming language.
You generate a Python SDK:
my_sdk/ ├── client.py ├── models/user.py └── exceptions.py
You add custom utilities:
my_sdk/ ├── client.py ├── models/user.py ├── exceptions.py ├── retry_wrapper.py └── test_helpers.py
The next API update comes in. You regenerate. What happens to retry_wrapper.py and test_helpers.py?
In most generators: Gone.
The Roach System: API-First Generation
We call it the "Roach" system because like cockroaches, your custom files survive everything.
This can't just be a CLI tool. The file resolution problem requires tracking what was generated previously to intelligently clean up obsolete files. But we can't force users to keep old API specs around. The workflow needs to be simple to fit nicely into a modern API development flow:
Upload new spec
Regenerate
All customizations maintained and new logic added
So we built our code generator as a synchronous HTTP API:
First SDK generation: POST /sdk returns a complete tar of the SDK
Updates: POST /sdk/{id}/update → returns git patch with only the necessary changes to the API reflected
Smart File Resolution
Here's the tricky part. Your API removes the /legacy-reports endpoint. When you regenerate:
legacy_report.pyshould be deleted (no longer needed)retry_wrapper.pyshould be kept (you added it)
But both look identical to the generator: files that exist but aren't in the current generation spec.
The solution is tracking what was generated before. We built a simple macro in Rust that defines exactly which files belong to the generated SDK structure. Not familiar with Rust macros? Learn more here.
The Roach Directory Macro
Our roach_dir! macro creates structured definitions of what files should exist in each part of the SDK:
#[macro_export] macro_rules! roach_dir { ( $struct_name:ident, { $( $field:ident: { path: $path:expr, boilerplate: $boilerplate:expr } ),* $(,)? } ) => { #[derive(Debug, Clone)] #[allow(dead_code)] pub struct $struct_name { pub root: camino::Utf8PathBuf, $(pub $field: $crate::RoachPath,)* } impl $crate::RoachDirTrait for $struct_name { fn new(root: &camino::Utf8PathBuf) -> Self { Self { root: root.clone(), $($field: $crate::RoachPath::new(&root.join($path), $boilerplate),)* } } fn paths(&self) -> Vec<$crate::RoachPath> { vec![$(self.$field.clone(),)*] } } }; }
This macro generates structs that track every file that should be part of the generated SDK, along with their boilerplate content.
Defining SDK Structure
We use the macro to define the complete structure of a Python SDK:
roach_dir!( PyPkg, { pyproject: { path: "pyproject.toml", boilerplate: include_str!("./boilerplate/pkg/pyproject.toml") }, gitignore: { path: ".gitignore", boilerplate: include_str!("./boilerplate/pkg/gitignore.txt") }, } ); roach_dir!( PySrc, { init: { path: "__init__.py", boilerplate: "" }, client: { path: "client.py", boilerplate: include_str!("./boilerplate/src/client.py") }, environment: { path: "environment.py", boilerplate: include_str!("./boilerplate/src/environment.py") }, readme: { path: "README.md", boilerplate: "" } } );
The Resolution Algorithm
With this structure defined, we can resolve which files should be kept, updated, or deleted:
pub fn resolve_paths_boilerplate( sdk_paths: &[RoachPath], prev_sdk_paths: &[RoachPath], git_paths: &[RoachPath], ) ->
The core logic differentiates between generated and custom files:
let deleted_sdk_paths: Vec<_> = prev_sdk_paths .iter() .filter(|prev| !sdk_paths.iter().any(|curr| curr.path == prev.path)) .collect(); for git_path in git_paths { let is_generated_now = sdk_paths.iter().any(|curr| git_path.path == curr.path); let was_generated_before = deleted_sdk_paths.contains(&git_path); if !is_generated_now && !was_generated_before { result.push(git_path.clone()) } }
How It Works in Practice
Each RoachPath knows both its filesystem location and its expected boilerplate content. This means we can:
Track ownership: Every generated file is explicitly declared in our macro definitions
Detect changes: Compare current boilerplate against what should be generated
Preserve custom files: If a file exists but isn't in any
roach_dir!definition, it's custom
The macro system eliminates guesswork. When your API removes the /legacy-reports endpoint, the corresponding legacy_report.py file simply won't appear in the new sdk_paths list, but it will be in prev_sdk_paths, so we know to delete it.
Meanwhile, retry_wrapper.py appears in neither list—it exists in your Git repository but was never part of our generated structure, so it gets preserved.
Real Example
You call the update endpoint with a new API spec:
Get back a git patch:
diff --git a/client.py b/client.py + def create_organization(self, data: dict): + return self._request("POST", "/organizations", data) diff --git a/models/organization.py b/models/organization.py new file mode 100644 +@dataclass +class Organization: + id: str + name: str
Apply with git apply patch.diff. Your custom files (retry_wrapper.py, test_helpers.py) are untouched. Only generated files that actually changed are in the patch.
Traditional generators force an ugly choice: accept generated code as-is, or maintain complex separation between generated and custom code.
Our API-first approach fixes both the technical problem and the workflow problem:
Instant feedback: Synchronous code generation, no waiting
Precise updates: Git patches show exactly what changed
Easy integration: HTTP APIs work with CI/CD and automation
Smart cleanup: Removes obsolete generated files, keeps your custom ones
Your teammates can confidently add files knowing they won't disappear. Your CI can call a simple endpoint to get updates. You get code generation without the usual constraints.
Maintaining Edits to Generated Files: Post Coming Soon
We will publish a post explaining how our unique approach to codegen unlocks maintaining edits to generated files.