GeoParquet
Geospatial data in
Encoding geospatial data in Apache Parquet.
About
Apache Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files. GeoParquet is an incubating Open Geospatial Consortium (OGC) standard that adds interoperable geospatial types (Point, Line, Polygon) to Parquet.
Read the specification for the v1.1.0 release (or see the metadata schema). Find links to older releases on the release page.
For more information see the goals and features section of the readme in the GeoParquet repository. There is also a nice deep dive on Parquet and GeoParquet in this blog post: Introducing the GeoParquet data format, and we'll be soon expanding this website with more details.
Why GeoParquet?
-
Standard Geospatial Data in Parquet
Following GeoParquet's structure enables interoperability between any system that reads or writes spatial data in Parquet
-
Columnar Data for Geo
Data science workflows benefit from columnar data formats, and geospatial analysis can tap into its innovations
-
Cloud Data Warehouse Interoperability
Snowflake, BigQuery, RedShift, DataBricks can all work together seamlessly with the same geospatial data format
Who is involved in GeoParquet?
Software
GeoParquet is rapidly maturing, with a number of new software libraries and tools coming online.
Tools
- Browser-based converter: powered by the GPQ library, you can convert GeoJSON to GeoParquet and vice-versa, from within your browser.
- GeoPandas (Python) extends the datatypes used by pandas to allow spatial operations on geometric types and supports reading and writing GeoParquet.
- QGIS Windows and Linux ship with GeoParquet support, and Mac can work installing with conda (from the terminal with conda activated run 'conda config --add channels conda-forge', 'conda install qgis libgdal-arrow-parquet', and then just type 'qgis' in the terminal). The GeoParquet Downloader Plugin enables easy streaming downloads from large online GeoParquet datasets.
- Scribble Maps is a full-featured web app that supports both import & export of GeoParquet.
- CARTO is a geospatial platform and supports import of GeoParquet.
- gpq provides a command-line interface to validate and describe any GeoParquet file. It can also convert GeoParquet to and from GeoJSON
- stac-geoparquet converts STAC catalogs into GeoParquet.
- Apache Sedona is a cluster computing system for processing large-scale spatial data that extends existing cluster computing systems like Apache Spark & Apache Flink. It can load and save GeoParquet with Scala, Java, Python or R.
- Esri's ArcGIS GeoAnalytics Engine 'delivers spatial analysis to your big data by extending Apache Spark with ready-to-use SQL functions and analysis tools'. It can load or save GeoParquet with the Python library or the Spark plugin, see their GeoParquet page for more details. ArcGIS Pro can also read and write GeoParquet with the Data Interoperability Extension
- FME: by Safe Software is a no code platform that effortlessly integrates your data, including read and write support for GeoParquet starting in version 23.1
- SeerAI's Geodesic Platform is a cloud-native, planetary scale Spatiotemporal Data Mesh and Data Fusion platform. Geodesic's Boson Service Mesh supports GeoParquet natively and can expose massive GeoParquet datasets as compatible formats to other analytical systems and geospatial software via APIs. All tabular and feature data outputs are written in Parquet/GeoParquet format.
- Wherobots provides a fully-managed cloud spatial data lakehouse that can manage and analyze geospatial data at any scale. All data on Wherobots can be saved in GeoParquet format and cataloged by its Havasu Spatial Table Format.
- pygeoapi is a Python server implementation of the OGC API suite of standards. It now supports a Parquet provider that allows publishing a GeoParquet file as an OGC API - Features collection.
- Fused is a data analytics platform that enables users to write and deploy Python User Defined Functions (UDFs) behind HTTP endpoints and interactive applications, with great support for geospatial data and GeoParquet.
- Felt is a cloud-native GIS platform helping users make maps, apps & dashboards in seconds, and supports GeoParquet importing.
- DuckDB is a fast, analytical, portable database, and its spatial extension can read and write GeoParquet files.
- GeoParquet Tools can check GeoParquet best practices, spatially order GeoParquet files (using DuckDB's Hilbert curve), and partition GeoParquet data.
- Google's Big Query data warehouse supports loading and writing GeoParquet.
- Atlas is a browser-based GIS platform with collaboration capabilities that provides visualization and analysis of a variety of formats, including GeoParquet.
- Kepler GL 3.1 is a open source geospatial analysis tool for large-scale data sets, and it can load and display GeoParquet (source code).
Libraries
- geoarrow (R)
- sfarrow (R)
- GDAL/OGR (C++, bindings in several languages)
- GeoParquet.jl (Julia)
- gpq (Go and WASM)
- Fiona (Python - as of version 1.9.4. Note the GeoParquet driver will only be available if your system's GDAL library links libarrow; fiona wheels on PyPI do not include libarrow as it is rather large.)
- .NET 6 library (.NET)
- C++ example code - see this discussion topic for more info.
- loaders.gl (Javascript)
- GeoParquet.js (JavaScript)
Data Providers & Public Data
There are many sources of GeoParquet data, with more and more coming online all the time. If you have or know of a good source of GeoParquet data please let us know!
- Overture Maps Foundation provides global data across six data themes (addresses, base, buildings, divisions, places, and transportation), using well-partitioned GeoParquet as their primary distribution format across multiple clouds. It consists of billions of features across hundreds of gigabytes.
- Microsoft provides access to all Planetary Computer STAC items as GeoParquet, see this quickstart guide for more information. Their Building Footprints are also distributed as GeoParquet.
- Planet provides their RapidAI4EO dataset's STAC items as GeoParquet, see the STAC Browser view of the data. They also provide a data set of field boundaries across all of Europe, derived with ML.
- source.coop provides numerous datasets in cloud-native geospatial formats, including over 60 GeoParquet. The Google-Microsoft-OSM Open Buildings - combined by VIDA has over 2.2 billion building footprints across the globe. And the fiboa organization provides numerous field boundary datasets from a variety of countries, all in GeoParquet.
- Foursquare's Open Source Places provides over 100 million points of interest, available as GeoParquet on Hugging Face.
- emotional.byteroad.net provides most of its +100 datasets in GeoParquet. The GeoParquet files all linked through the metadata records.