Untitled

6 min read Original article ↗

GeoParquet

Geospatial data in Apache Parquet

Encoding geospatial data in Apache Parquet.

About

Apache Parquet is a powerful column-oriented data format, built from the ground up to as a modern alternative to CSV files. GeoParquet is an incubating Open Geospatial Consortium (OGC) standard that adds interoperable geospatial types (Point, Line, Polygon) to Parquet.

Read the specification for the v1.1.0 release (or see the metadata schema). Find links to older releases on the release page.

For more information see the goals and features section of the readme in the GeoParquet repository. There is also a nice deep dive on Parquet and GeoParquet in this blog post: Introducing the GeoParquet data format, and we'll be soon expanding this website with more details.

Why GeoParquet?

  • Standard Geospatial Data in Parquet

    Following GeoParquet's structure enables interoperability between any system that reads or writes spatial data in Parquet

  • Columnar Data for Geo

    Data science workflows benefit from columnar data formats, and geospatial analysis can tap into its innovations

  • Cloud Data Warehouse Interoperability

    Snowflake, BigQuery, RedShift, DataBricks can all work together seamlessly with the same geospatial data format

Who is involved in GeoParquet?

Software

GeoParquet is rapidly maturing, with a number of new software libraries and tools coming online.

Tools

  • Browser-based converter: powered by the GPQ library, you can convert GeoJSON to GeoParquet and vice-versa, from within your browser.
  • GeoPandas (Python) extends the datatypes used by pandas to allow spatial operations on geometric types and supports reading and writing GeoParquet.
  • QGIS Windows and Linux ship with GeoParquet support, and Mac can work installing with conda (from the terminal with conda activated run 'conda config --add channels conda-forge', 'conda install qgis libgdal-arrow-parquet', and then just type 'qgis' in the terminal). The GeoParquet Downloader Plugin enables easy streaming downloads from large online GeoParquet datasets.
  • Scribble Maps is a full-featured web app that supports both import & export of GeoParquet.
  • CARTO is a geospatial platform and supports import of GeoParquet.
  • gpq provides a command-line interface to validate and describe any GeoParquet file. It can also convert GeoParquet to and from GeoJSON
  • stac-geoparquet converts STAC catalogs into GeoParquet.
  • Apache Sedona is a cluster computing system for processing large-scale spatial data that extends existing cluster computing systems like Apache Spark & Apache Flink. It can load and save GeoParquet with Scala, Java, Python or R.
  • Esri's ArcGIS GeoAnalytics Engine 'delivers spatial analysis to your big data by extending Apache Spark with ready-to-use SQL functions and analysis tools'. It can load or save GeoParquet with the Python library or the Spark plugin, see their GeoParquet page for more details. ArcGIS Pro can also read and write GeoParquet with the Data Interoperability Extension
  • FME: by Safe Software is a no code platform that effortlessly integrates your data, including read and write support for GeoParquet starting in version 23.1
  • SeerAI's Geodesic Platform is a cloud-native, planetary scale Spatiotemporal Data Mesh and Data Fusion platform. Geodesic's Boson Service Mesh supports GeoParquet natively and can expose massive GeoParquet datasets as compatible formats to other analytical systems and geospatial software via APIs. All tabular and feature data outputs are written in Parquet/GeoParquet format.
  • Wherobots provides a fully-managed cloud spatial data lakehouse that can manage and analyze geospatial data at any scale. All data on Wherobots can be saved in GeoParquet format and cataloged by its Havasu Spatial Table Format.
  • pygeoapi is a Python server implementation of the OGC API suite of standards. It now supports a Parquet provider that allows publishing a GeoParquet file as an OGC API - Features collection.
  • Fused is a data analytics platform that enables users to write and deploy Python User Defined Functions (UDFs) behind HTTP endpoints and interactive applications, with great support for geospatial data and GeoParquet.
  • Felt is a cloud-native GIS platform helping users make maps, apps & dashboards in seconds, and supports GeoParquet importing.
  • DuckDB is a fast, analytical, portable database, and its spatial extension can read and write GeoParquet files.
  • GeoParquet Tools can check GeoParquet best practices, spatially order GeoParquet files (using DuckDB's Hilbert curve), and partition GeoParquet data.
  • Google's Big Query data warehouse supports loading and writing GeoParquet.
  • Atlas is a browser-based GIS platform with collaboration capabilities that provides visualization and analysis of a variety of formats, including GeoParquet.
  • Kepler GL 3.1 is a open source geospatial analysis tool for large-scale data sets, and it can load and display GeoParquet (source code).

Libraries

Data Providers & Public Data

There are many sources of GeoParquet data, with more and more coming online all the time. If you have or know of a good source of GeoParquet data please let us know!