Show HN: Forge, the NoSQL to SQL Compiler
https://forge.foxtrotcommunications.net/
I've been a data engineer for years and one thing drove me crazy: every time we integrated a new API, someone had to manually write SQL to flatten the JSON into tables. LATERAL FLATTEN for Snowflake, UNNEST for BigQuery, EXPLODE for Databricks — same logic, different syntax, written from scratch every time.
Forge takes an OpenAPI spec (or any JSON schema) and automatically:
1. Discovers all fields across all nesting levels 2. Generates dbt models that flatten nested JSON into a star schema 3. Compiles for BigQuery, Snowflake, Databricks, AND Redshift from the same metadata 4. Runs incrementally — new fields get added via schema evolution, no rebuilds
The key insight is that JSON-to-table is a compilation problem, not a query problem. If you know the schema, you can generate all the SQL mechanically. Forge is essentially a compiler: schema in, warehouse- specific SQL out.
How it works under the hood:
- An introspection phase scans actual data rows and collects the union of ALL keys (not just one sample record), so sparse/optional fields are always discovered - Each array-of-objects becomes its own child table with a hierarchical index (idx) linking back to the parent — no manual join keys needed - Warehouse adapters translate universal metadata into dialect-specific SQL: BigQuery: UNNEST(JSON_EXTRACT_ARRAY(...)) Snowflake: LATERAL FLATTEN(input => PARSE_JSON(...)) Databricks: LATERAL VIEW EXPLODE(from_json(...)) Redshift: JSON_PARSE + manual extraction - dbt handles incremental loads with on_schema_change='append_new_columns'
The full pipeline: Bellows (synthetic data generation from OpenAPI specs) → BigQuery staging → Forge (model generation + dbt run) → queryable tables + dbt docs. There's also Merlin (AI-powered field enrichment via Gemini) that auto-generates realistic data generators for each field.
I built this because I watched teams spend weeks writing one-off FLATTEN queries that broke the moment an API added a field. Every Snowflake blog post shows you how to parse 3 fields from a known schema — none of them handle schema evolution, arbitrary nesting depth, or cross-warehouse portability.
Try it: https://forge.foxtrotcommunications.net
Happy to answer questions about the architecture, the cross-warehouse
compilation approach, or the AI enrichment layer. The cross warehouse portability problem you're solving is real. I've watched teams maintain four separate FLATTEN implementations for the same pipeline just because they were multi-cloud. The compiler framing makes sense. Curious how you handle schema drift at the introspection phase specifically when an API starts returning a field as sometimes a string and sometimes an object depending on the endpoint response. Does Forge pick a winner or surface it as a conflict for the user to resolve?