Show HN: Build ML training datasets from large-scale satellite/aerial imagery

2 points by noahgolmant 11 days ago · 0 comments · 2 min read

Reader

This is a small tool to label bounding boxes on satellite/aerial imagery and export training datasets for object detection.

Web maps like Google Earth work by stitching together lots of small images called tiles (this is why you see square patches as the page loads). They do this by querying a "tile server" API that reads from sharded raster files in cloud storage. In my day job we built infra to efficiently serve imagery through tile servers for map visualization. I wanted to test out ML applications of that infra. It turns out this standard can also be leveraged to label and fine-tune models on map imagery.

This tool lets you point at any tile server URL, draw labeled bounding boxes, and export labels in COCO annotation format, plus download underlying tile PNGs for training/inference. This can feed directly into standard computer vision frameworks like ultrayltics or pytorch.

The workflow is hotkey-driven: draw a box, press 1-9 to assign a category (or N for negative examples). You can also drag-and-drop local GeoTIFFs.

I found this helpful to experiment with fine-tuning SAM 3 on local aerial imagery. It was nice to zip up the PNGs + COCO file, drag and drop to a colab notebook, and run inference.

There are other interesting applications of this I'd like to explore, like in-browser map-based segmentation / object detection with onnx.

No comments yet.

Settings

Show HN: Build ML training datasets from large-scale satellite/aerial imagery

Keyboard Shortcuts