Build a collection once. Query it like a table. Read pixels 20x faster from cloud COGs.
Rasteret is an index-first reader for cloud-hosted tiled GeoTIFFs and COGs. It builds a queryable Arrow/Parquet collection with scene metadata, asset URLs, CRS sidecars, and parsed COG header metadata. Pixels stay in the original COGs.
After that, you can filter, join, and enrich the collection as a table, then read only the pixels you need into NumPy, xarray, GeoPandas, TorchGeo, or Arrow point-sample tables.
STAC / Parquet / Arrow table -> Rasteret Collection -> NumPy / xarray / GeoPandas / TorchGeo
external labels / plots / points filter/join/share read pixels on demand
Why Rasteret
Remote raster workflows often repeat the same setup work: STAC loops, COG header parsing, tile byte-range planning, CRS transforms, retries, and output assembly.
Rasteret moves the expensive raster metadata discovery into a Collection build
step and reuses that metadata for later reads.
That helps when you:
- train or evaluate models over many remote COG scenes
- repeatedly sample the same imagery with different AOIs, points, labels, or splits
- avoid rediscovering raster header metadata in new notebooks, containers, or machines
- want one source collection to feed TorchGeo, xarray, NumPy, GeoPandas, and Arrow tools
- need DuckDB, Polars, PyArrow, or GeoPandas to work on metadata and external geometries before pixel reads
Quick Example
import rasteret sentinel2_collection = rasteret.build( "earthsearch/sentinel-2-l2a", name="s2_bangalore", bbox=(77.5, 12.9, 77.7, 13.1), date_range=("2024-01-01", "2024-01-31"), ) clear = sentinel2_collection.subset(cloud_cover_lt=50) arr = clear.get_numpy( geometries=(77.55, 13.01, 77.58, 13.08), bands=["B04", "B08"], )
The same collection can feed TorchGeo:
dataset = clear.to_torchgeo_dataset( bands=["B04", "B03", "B02", "B08"], chip_size=256, )
Bring Your Own Geometry And Metadata
Rasteret works well with the table tools you already use. External labels, farm plots, asset locations, fire boundaries, or point samples can stay in GeoPandas, DuckDB, Polars, or PyArrow until you need pixels.
import duckdb import geopandas as gpd import rasteret from shapely.geometry import box plots = gpd.GeoDataFrame( { "plot_id": ["plot-a"], "crop": ["rice"], }, geometry=[box(77.55, 13.01, 77.58, 13.08)], crs="OGC:CRS84", ) plots_arrow = plots.to_arrow(geometry_encoding="WKB") con = duckdb.connect() con.sql("INSTALL spatial; LOAD spatial;") con.register("sen2_rasteret", clear) con.register("plots", plots_arrow) # Bring your own geometries plot_aois = con.sql(""" SELECT plots.plot_id, plots.crop, plots.geometry AS plot_geometry FROM sen2_rasteret, plots WHERE sen2_rasteret."eo:cloud_cover" < 10 AND ST_Intersects( ST_GeomFromWKB(sen2_rasteret.geometry), ST_GeomFromWKB(plots.geometry) ) """) plot_patches = clear.get_gdf( geometries=plot_aois, geometry_column="plot_geometry", geometry_crs=4326, bands=["B04", "B08"], )
The same pattern works with Polars or PyArrow for split/label columns, and with
sample_points(...) when your external data is point-based. get_gdf(...) and
sample_points(...) keep business columns such as plot_id in their outputs.
What You Can Do
| Task | Rasteret surface |
|---|---|
| Build from a registered dataset | rasteret.build("catalog/id", ...) |
| Build from your own Parquet, GeoParquet, DuckDB, Polars, or Arrow record table | rasteret.build_from_table(...) |
| Reopen a saved or prebuilt Collection | rasteret.load(path_or_dataset_id) |
| Re-wrap a read-ready Arrow object | rasteret.as_collection(...) |
| Get numpy arrays | Collection.get_numpy(...) |
| Get xarray dataset | Collection.get_xarray(...) |
| Get GeoPandas rows with pixel arrays | Collection.get_gdf(...) |
| Sample pixels at points | Collection.sample_points(...) |
| Train/infer with TorchGeo | Collection.to_torchgeo_dataset(...) |
Dataset Catalog
Rasteret ships with dataset IDs so you do not have to remember STAC endpoints,
band maps, license metadata, or cloud access settings. Most catalog entries are
recipes for rasteret.build(...): Rasteret searches the source catalog, parses
the COG metadata once, and writes a reusable local Collection.
Only one built-in ID is already a read-ready Rasteret Collection:
aef/v1-annual. Use rasteret.load("aef/v1-annual") for AlphaEarth Foundation
Embeddings. The built-in alias loads Rasteret's maintained Source Cooperative
Collection. You do not need to call build() for this dataset.
| ID | Dataset | Coverage | Auth | Use |
|---|---|---|---|---|
aef/v1-annual |
AlphaEarth Foundation Embeddings (Annual) | global | none | rasteret.load(...) |
earthsearch/sentinel-2-l2a |
Sentinel-2 Level-2A | global | none | rasteret.build(...) |
earthsearch/landsat-c2-l2 |
Landsat Collection 2 Level-2 | global | required | rasteret.build(...) |
earthsearch/naip |
NAIP | north-america | required | rasteret.build(...) |
earthsearch/cop-dem-glo-30 |
Copernicus DEM 30m | global | none | rasteret.build(...) |
earthsearch/cop-dem-glo-90 |
Copernicus DEM 90m | global | none | rasteret.build(...) |
pc/sentinel-2-l2a |
Sentinel-2 Level-2A (Planetary Computer) | global | required | rasteret.build(...) |
pc/io-lulc-annual-v02 |
ESRI 10m Land Use/Land Cover | global | required | rasteret.build(...) |
pc/alos-dem |
ALOS World 3D 30m DEM | global | required | rasteret.build(...) |
pc/nasadem |
NASADEM | global | required | rasteret.build(...) |
pc/esa-worldcover |
ESA WorldCover | global | required | rasteret.build(...) |
pc/usda-cdl |
USDA Cropland Data Layer | conus | required | rasteret.build(...) |
You can browse the same list from the CLI:
rasteret datasets list rasteret datasets info aef/v1-annual
To make your own dataset ID for a reusable local collection or Parquet record table, see Register A Local Collection into Dataset Catalog.
Performance
Rasteret is 10x to 20x faster than rasterio/GDAL
| Scenario | TorchGeo/rasterio | Rasteret | Speedup |
|---|---|---|---|
| Single AOI, 15 scenes | 9.08 s | 1.14 s | 8.0x |
| Multi-AOI, 30 scenes | 42.05 s | 2.25 s | 18.7x |
| Cross-CRS, 12 scenes | 12.47 s | 0.59 s | 21.3x |
Rasteret also compares well against time-series workflows that use Google Earth Engine or thread-pooled rasterio for the measured setup:
| Library | First run (cold) | Subsequent runs (hot) |
|---|---|---|
| Rasterio + ThreadPool | 32 s | 24 s |
| Google Earth Engine | 10-30 s | 3-5 s |
| Rasteret | 3 s | 3 s |
See the Benchmarks guide
for methodology, environment details, and additional Hugging Face datasets
comparisons.
Install
Optional integrations:
uv pip install "rasteret[torchgeo]" uv pip install "rasteret[aws]" uv pip install "rasteret[azure]" uv pip install "rasteret[all]" # all optional integrations for exploration
Rasteret requires Python 3.12 or later.
Learn More
- Getting Started
- Build from Parquet and Arrow Tables
- Bring Your Own AOIs, Points, And Metadata
- TorchGeo Integration
- Benchmarks
- API Reference
License
Code: Apache-2.0

