Today, we are thrilled to announce that Snowflake is open sourcing With Now you can use Postgres to: All of these capabilities come with the Postgres we all know and love. Have a CSV in S3 or some Parquet data in Azure blob storage you’re looking to query? It’s as simple as:pg_lake, a powerful set of PostgreSQL extensions that allow Postgres to easily work with your data lakehouse.pg_lake, you can query, manage and write to Iceberg tables using standard SQL. Crunchy Data spent several years developing pg_lake as the foundation for its enterprise-ready warehouse offerings before joining Snowflake earlier this year. pg_lake underpinned Crunchy Bridge for Analytics, which introduced new analytical capabilities to Postgres and later became Crunchy Data Warehouse.1 Today we’re open sourcing that same functionality under the Apache license so the broader Postgres community can benefit.
pg_lake introduces a new Iceberg table type where Postgres itself acts as the catalog. This means you can easily, cheaply and durably create Iceberg tables directly from Postgres with full Postgres transaction semantics. pg_lake you can easily query data files and directories that are already in S3, or insert them into Iceberg tables. You can also query external Iceberg tables, Delta tables and various geospatial file formats.Putting
pg_lake into actionQuery files in your data lake
create foreign table hits ()
server pg_lake
options (path 'az://your_container_name/hits.parquet');
SELECT count(*)
FROM hits;If you want to load data directly from a bucket, pg_lake will automatically detect the column types of your files (including CSVs) and allow you to easily query the data. Load data with the familiar
COPY commandpg_lake supports the Postgres COPY mechanism to easily load data without requiring additional tools. Just like directly querying your data (above), loading data from your data lake is as simple as:
COPY hits
FROM 's3://your_bucket_name/your_file_name.xx';This works well for easily moving data in and out of Postgres. There is a great simplicity in not having to create ETL pipelines, and instead just loading data as it exists, or querying it where it lies. What about working with open formats such as Iceberg? With Create and manage native Iceberg tables
pg_lake you have a new ability to define a table as an Iceberg table. This manages a full catalog for your Iceberg tables directly in Postgres, while still giving you full Postgres semantics, including transactions. You can move data from Postgres, with its row-based tables, to your Iceberg, which uses columnar tables, in a fully transactional manner. Let’s take a look at creating an Iceberg table from Postgres:
create table logs (
/* column names and types */
)
using iceberg;
-- Query it like any other table
SELECT date_trunc('day', event_time) AS day, count(*)
FROM logs
GROUP BY day
ORDER BY day;Do you already have data sitting in Parquet that you want to use to bootstrap your Iceberg table? We can create an initial Iceberg table directly from that, then use the COPY utility to load additional data. Let’s see how that looks with data related to taxi trips in New York City:
create table trips_yellow ()
using iceberg
with (load_from = 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet');
\d
┌───────────────────────┬─────────────────────────────┬───────────┬──────────┬─────────┬─────────────┐
│ Column │ Type │ Collation │ Nullable │ Default │ FDW options │
├───────────────────────┼─────────────────────────────┼───────────┼──────────┼─────────┼─────────────┤
│ vendorid │ integer │ │ │ │ │
│ tpep_pickup_datetime │ timestamp without time zone │ │ │ │ │
...
│ airport_fee │ double precision │ │ │ │ │
└───────────────────────┴─────────────────────────────┴───────────┴──────────┴─────────┴─────────────┘
copy trips_yellow from 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-02.parquet';
copy trips_yellow from 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-03.parquet';
copy trips_yellow from 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-04.parquet';
-- or, load files programmatically (and transactionally) in a function or DO block
do $do$
declare month text;
begin
for month in select to_char(m, 'YYYY-MM') from generate_series('2024-03-01', '2024-10-31', interval '1 month') m loop
execute format($$copy trips_yellow from 'https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_%s.parquet'$$, month);
end loop;
end
$do$;That’s it. The story of The team knew it could address this with a native and intuitive Postgres experience. Crunchy Data decided to invest in this capability, and when it first launched Crunchy Bridge for Analytics more than 18 months ago, other solutions were virtually nonexistent. The team didn’t know it then, but it was helping build a movement toward using Postgres as part of a data platform that combines transactional and analytical capabilities. As this space matured and Crunchy Data was acquired by Snowflake and launched Snowflake Postgres, the value of making Postgres more interoperable and unified with analytics became undeniable. We believe Postgres is a core component of the modern data stack — it's far more than just a relational database. With powerful features for JSON, geospatial ( We’ve also seen that companies increasingly use a mix of operational and analytical capabilities to deliver modern applications and customer experiences. Snowflake offers an exceptional data platform to do this, and by expanding the capabilities of Postgres, we know we’ll see more people gain the power to develop modern, data- and AI-driven agents and applications. By open sourcing this technology, we aim to: As We are incredibly excited to share 1 These blogs are an archive from Crunchy Data, published prior to its acquisition by Snowflake.pg_lake gives you a native Postgres experience for interacting with open data lakehouse formats like Iceberg and Parquet.The journey to
pg_lakepg_lake began with a simple observation. The team behind it, first at Crunchy Data and now at Snowflake, has a deep history of building foundational Postgres extensions like PostGIS, Citus, pg_cron, pg_partman and many more. This background gave the team a front-row seat to a recurring challenge customers faced: Their data was split between their Postgres database and object storage, and bridging that gap was a constant struggle.Why we are open sourcing
pg_lakePostGIS), and vector search (pgvector), Postgres is already an operational data platform. Meanwhile, the data lakehouse has emerged as the standard for managing analytical data at scale.
pg_lake, the Postgres community will be able to unlock new use cases and accelerate innovation.pg_lake and Snowflake Postgres become part of the broader Snowflake ecosystem, we look forward to a future where the gap between operational and analytical data is finally closed.Get started today
pg_lake with you. It’s ready for you to explore and use now.