GitHub - terrafloww/rasteret: Rasteret is a library for 20x+ faster reads of GeoTIFF than Rasterio/GDAL. Interops with TorchGeo, Xarray, DuckDB, Polars

Build a collection once. Query it like a table. Read pixels 20x faster from cloud COGs.

Rasteret is an index-first reader for cloud-hosted tiled GeoTIFFs and COGs. It builds a queryable Arrow/Parquet collection with scene metadata, asset URLs, CRS sidecars, and parsed COG header metadata. Pixels stay in the original COGs.

After that, you can filter, join, and enrich the collection as a table, then read only the pixels you need into NumPy, xarray, GeoPandas, TorchGeo, or Arrow point-sample tables.

STAC / Parquet / Arrow table -> Rasteret Collection -> NumPy / xarray / GeoPandas / TorchGeo
external labels / plots / points    filter/join/share          read pixels on demand

Why Rasteret

Remote raster workflows often repeat the same setup work: STAC loops, COG header parsing, tile byte-range planning, CRS transforms, retries, and output assembly.

Rasteret moves the expensive raster metadata discovery into a Collection build step and reuses that metadata for later reads.

That helps when you:

train or evaluate models over many remote COG scenes
repeatedly sample the same imagery with different AOIs, points, labels, or splits
avoid rediscovering raster header metadata in new notebooks, containers, or machines
want one source collection to feed TorchGeo, xarray, NumPy, GeoPandas, and Arrow tools
need DuckDB, Polars, PyArrow, or GeoPandas to work on metadata and external geometries before pixel reads

Quick Example

import rasteret

sentinel2_collection = rasteret.build(
    "earthsearch/sentinel-2-l2a",
    name="s2_bangalore",
    bbox=(77.5, 12.9, 77.7, 13.1),
    date_range=("2024-01-01", "2024-01-31"),
)

clear = sentinel2_collection.subset(cloud_cover_lt=50)

arr = clear.get_numpy(
    geometries=(77.55, 13.01, 77.58, 13.08),
    bands=["B04", "B08"],
)

The same collection can feed a downstream chip-based dataset:

metadata = clear.to_table(
    columns=["id", "datetime", "geometry", "proj:epsg", "B04_metadata", "B03_metadata", "B02_metadata", "B08_metadata"],
)

Bring Your Own Geometry And Metadata

Rasteret works well with the table tools you already use. External labels, farm plots, asset locations, fire boundaries, or point samples can stay in GeoPandas, DuckDB, Polars, or PyArrow until you need pixels.

import duckdb
import geopandas as gpd
import rasteret
from shapely.geometry import box

plots = gpd.GeoDataFrame(
    {
        "plot_id": ["plot-a"],
        "crop": ["rice"],
    },
    geometry=[box(77.55, 13.01, 77.58, 13.08)],
    crs="OGC:CRS84",
)
plots_arrow = plots.to_arrow(geometry_encoding="WKB")

con = duckdb.connect()
con.sql("INSTALL spatial; LOAD spatial;")
con.register("sen2_rasteret", clear)
con.register("plots", plots_arrow)

# Bring your own geometries
plot_aois = con.sql("""
    SELECT
        plots.plot_id,
        plots.crop,
        plots.geometry AS plot_geometry
    FROM sen2_rasteret, plots
    WHERE sen2_rasteret."eo:cloud_cover" < 10
      AND ST_Intersects(
          ST_GeomFromWKB(sen2_rasteret.geometry),
          ST_GeomFromWKB(plots.geometry)
      )
""")

plot_patches = clear.get_gdf(
    geometries=plot_aois,
    geometry_column="plot_geometry",
    geometry_crs=4326,
    bands=["B04", "B08"],
)

The same pattern works with Polars or PyArrow for split/label columns, and with sample_points(...) when your external data is point-based. get_gdf(...) and sample_points(...) keep business columns such as plot_id in their outputs.

What You Can Do

Task	Rasteret surface
Build from a registered dataset	`rasteret.build("catalog/id", ...)`
Build from your own Parquet, GeoParquet, DuckDB, Polars, or Arrow record table	`rasteret.build_from_table(...)`
Reopen a saved or prebuilt Collection	`rasteret.load(path_or_dataset_id)`
Re-wrap a read-ready Arrow object	`rasteret.as_collection(...)`
Get numpy arrays	`Collection.get_numpy(...)`
Get xarray dataset	`Collection.get_xarray(...)`
Get GeoPandas rows with pixel arrays	`Collection.get_gdf(...)`
Sample pixels at points	`Collection.sample_points(...)`
Build a downstream TorchGeo dataset	`Collection.to_table(...)` + `Collection.read_window(...)`

Dataset Catalog

Rasteret ships with dataset IDs so you do not have to remember STAC endpoints, band maps, license metadata, or cloud access settings. Most catalog entries are recipes for rasteret.build(...): Rasteret searches the source catalog, parses the COG metadata once, and writes a reusable local Collection.

Only one built-in ID is already a read-ready Rasteret Collection: aef/v1-annual. Use rasteret.load("aef/v1-annual") for AlphaEarth Foundation Embeddings. The built-in alias loads Rasteret's maintained Source Cooperative Collection. You do not need to call build() for this dataset.

ID	Dataset	Coverage	Auth	Use
`aef/v1-annual`	AlphaEarth Foundation Embeddings (Annual)	global	none	`rasteret.load(...)`
`earthsearch/sentinel-2-l2a`	Sentinel-2 Level-2A	global	none	`rasteret.build(...)`
`earthsearch/landsat-c2-l2`	Landsat Collection 2 Level-2	global	required	`rasteret.build(...)`
`earthsearch/naip`	NAIP	north-america	required	`rasteret.build(...)`
`earthsearch/cop-dem-glo-30`	Copernicus DEM 30m	global	none	`rasteret.build(...)`
`earthsearch/cop-dem-glo-90`	Copernicus DEM 90m	global	none	`rasteret.build(...)`
`pc/sentinel-2-l2a`	Sentinel-2 Level-2A (Planetary Computer)	global	required	`rasteret.build(...)`
`pc/io-lulc-annual-v02`	ESRI 10m Land Use/Land Cover	global	required	`rasteret.build(...)`
`pc/alos-dem`	ALOS World 3D 30m DEM	global	required	`rasteret.build(...)`
`pc/nasadem`	NASADEM	global	required	`rasteret.build(...)`
`pc/esa-worldcover`	ESA WorldCover	global	required	`rasteret.build(...)`
`pc/usda-cdl`	USDA Cropland Data Layer	conus	required	`rasteret.build(...)`

You can browse the same list from the CLI:

rasteret datasets list
rasteret datasets info aef/v1-annual

To make your own dataset ID for a reusable local collection or Parquet record table, see Register A Local Collection into Dataset Catalog.

Performance

Rasteret is 10x to 20x faster than rasterio/GDAL

Scenario	TorchGeo/rasterio	Rasteret	Speedup
Single AOI, 15 scenes	9.08 s	1.14 s	8.0x
Multi-AOI, 30 scenes	42.05 s	2.25 s	18.7x
Cross-CRS, 12 scenes	12.47 s	0.59 s	21.3x

Rasteret also compares well against time-series workflows that use Google Earth Engine or thread-pooled rasterio for the measured setup:

Library	First run (cold)	Subsequent runs (hot)
Rasterio + ThreadPool	32 s	24 s
Google Earth Engine	10-30 s	3-5 s
Rasteret	3 s	3 s

See the Benchmarks guide for methodology, environment details, and additional Hugging Face datasets comparisons.

Install

Optional integrations:

uv pip install "rasteret[aws]"
uv pip install "rasteret[azure]"
uv pip install "rasteret[all]"  # all optional integrations for exploration

Rasteret requires Python 3.12 or later.

Learn More

License

Code: Apache-2.0