Show HN: Forge, the NoSQL to SQL Compiler

4 points by brady_bastian 2 months ago · 2 comments · 2 min read

https://forge.foxtrotcommunications.net/

I've been a data engineer for years and one thing drove me crazy: every time we integrated a new API, someone had to manually write SQL to flatten the JSON into tables. LATERAL FLATTEN for Snowflake, UNNEST for BigQuery, EXPLODE for Databricks — same logic, different syntax, written from scratch every time.

Forge takes an OpenAPI spec (or any JSON schema) and automatically:

1. Discovers all fields across all nesting levels 2. Generates dbt models that flatten nested JSON into a star schema 3. Compiles for BigQuery, Snowflake, Databricks, AND Redshift from the same metadata 4. Runs incrementally — new fields get added via schema evolution, no rebuilds

The key insight is that JSON-to-table is a compilation problem, not a query problem. If you know the schema, you can generate all the SQL mechanically. Forge is essentially a compiler: schema in, warehouse- specific SQL out.

How it works under the hood:

- An introspection phase scans actual data rows and collects the union of ALL keys (not just one sample record), so sparse/optional fields are always discovered - Each array-of-objects becomes its own child table with a hierarchical index (idx) linking back to the parent — no manual join keys needed - Warehouse adapters translate universal metadata into dialect-specific SQL: BigQuery: UNNEST(JSON_EXTRACT_ARRAY(...)) Snowflake: LATERAL FLATTEN(input => PARSE_JSON(...)) Databricks: LATERAL VIEW EXPLODE(from_json(...)) Redshift: JSON_PARSE + manual extraction - dbt handles incremental loads with on_schema_change='append_new_columns'

The full pipeline: Bellows (synthetic data generation from OpenAPI specs) → BigQuery staging → Forge (model generation + dbt run) → queryable tables + dbt docs. There's also Merlin (AI-powered field enrichment via Gemini) that auto-generates realistic data generators for each field.

I built this because I watched teams spend weeks writing one-off FLATTEN queries that broke the moment an API added a field. Every Snowflake blog post shows you how to parse 3 fields from a known schema — none of them handle schema evolution, arbitrary nesting depth, or cross-warehouse portability.

Try it: https://forge.foxtrotcommunications.net

Happy to answer questions about the architecture, the cross-warehouse compilation approach, or the AI enrichment layer.

Shyaamal11 2 months ago

The cross warehouse portability problem you're solving is real. I've watched teams maintain four separate FLATTEN implementations for the same pipeline just because they were multi-cloud. The compiler framing makes sense.

Curious how you handle schema drift at the introspection phase specifically when an API starts returning a field as sometimes a string and sometimes an object depending on the endpoint response. Does Forge pick a winner or surface it as a conflict for the user to resolve?

Settings

Show HN: Forge, the NoSQL to SQL Compiler

Keyboard Shortcuts