ApertureLab · Synthetic Aperture Sonar Simulator

The seafloor is dark. It doesn't have to be.

ApertureLab is a synthetic aperture sonar simulation and beamforming workbench; built to accelerate the AI systems that will map and understand the ocean floor at scale.

Why synthetic aperture? →

See the software in detail →

↓

Isaac D. Gerg, Ph.D.

AI Scientist, ClimateAI · gergltd.com

Most underwater AI projects fail at the same seam: the ML team does not understand the acoustics, and the sonar team does not understand deep learning. I work at both ends. Twenty years of research spanning the complete signal chain (acoustic wave scattering and IQ time-series recording, through beamforming and image formation, to deep embeddings for ATR, segmentation, image compression, and making each stage robust to domain shift) means I can follow the signal from the seafloor to a model embedding and identify exactly where it breaks.

ApertureLab is what that understanding looks like as a piece of software.

Wave Scattering Physics IQ Time-Series Processing TDBP & ω-k Beamforming Image Formation Deep Embeddings ATR Segmentation Image Compression Physics-Based AI/ML Domain Robustness

SAR foundation models exist because satellite imagery was cheap, abundant, and labeled. Sonar data requires ships, dive operations, and classified access. ApertureLab generates physics-accurate labeled sonar data at satellite-imagery scale; the missing precondition that makes the following possible for the first time.

Synthetic Datasets

Million-image labeled datasets from a single workstation

Every simulated image carries ground-truth labels derived automatically from the scene: object class, position, orientation, burial depth, seafloor type, shadow mask. A dataset that would require decades of ship time to collect in the field can be generated overnight.

Object detection and segmentation labels
Pixel-accurate shadow and highlight masks
Full domain variation: depth, range, bottom type, aspect, burial
Rare-event coverage impossible to collect in the field

The data problem for underwater AI is a logistics problem. ApertureLab is the logistics solution.

Foundation Models

No general-purpose foundation model has been trained on SAS imagery. Yet.

Not because the architecture does not exist (ViT, MAE, and contrastive pretraining are all mature), but because training data at the necessary scale never existed. ApertureLab closes that gap with parametric scenes spanning the full space of acoustic environments.

Sim-to-real transfer for autonomous underwater vehicle perception
Environment-agnostic feature representations across seabed types
Edge-case and rare-target coverage unavailable in field collections
The sonar analogue to ImageNet-scale pretraining

This is the next rung. The ladder already exists.

Vision-Language Models

VLMs already respond to sonar. Fine-tuning is the obvious next step.

The 2026 IGARSS work shows VLMs classify SAS targets at 0.946 AUC using only a text prompt describing highlight-shadow geometry, with zero domain-specific training. Scene captions generated at render time make the fine-tuning step on a million labeled ApertureLab images straightforward.

Auto-generated natural-language captions paired with every image
Query sonar archives by English phrase
Caption imagery for non-expert operators and analysts
A path to zero-shot object recognition across unseen target types

The zero-shot baseline is 0.946 AUC. Domain fine-tuning is the straightforward next step.

Evidence

ApertureLab grid scene: sixty seafloor objects laid out across ten varied bottom types in a 6-by-10 grid, each cell with green auto-generated bounding boxes and labels, plus three zoom insets along the right edge showing the same target on ripple, rocky, and mud backgrounds.

A single English prompt to ApertureLab, written through Claude Code: “create a 60 by 100 m scene with a grid of different seafloor backgrounds and a target in the middle of each one, so I can crop 256 by 256 chips of the same object on different bottoms for ML training.” The image above is what came back, rendered overnight on one workstation; physics through the entire chain. Sixty seafloor targets, ten varied bottom classes — clean sand, mud, rock, gravel, coarse gravel, silt, rippled sand, two rocky-sediment variants, and shelly sand hash — with roughness and reflectivity jittered per cell so no two cells match. Every green box, name, and seafloor zone outline is generated automatically from the scene file at simulation time, not hand-annotated. The three insets along the right show the same target lifted off three of those backgrounds.

The same physics that renders a labeled scene forward is what trains the model that reads one backward.

Simulate

Scene editor for sonar geometry, seafloor types, objects, and platform motion. GPU ray-trace physics backend for accurate acoustic scattering and shadow geometry.

Beamform

Integrated TDBP and ω-k pipelines produce georeferenced SLC imagery directly in the tool.

Inspect

Interactive viewer with physical-coordinate cursor readout, three output variants, pan/zoom.

How it works

A synthetic aperture sonar return is not a picture. The sensor records raw complex time-series and the platform's own motion; an image only exists after signal processing reconstructs it. The figure below walks the full chain, from recorded I/Q and navigation through pulse compression, motion estimation, and back-projection, to the focused single-look complex image and the seafloor map a person finally reads.

A left-to-right pipeline diagram titled How a SAS/SAR image is made. Raw I/Q time-series and navigation feed into filtering and pulse compression, then motion estimation by micronavigation, then image formation by back-projection, producing a single-look complex image, then post-processing for dynamic range, ending in a human-readable seafloor image. Each stage shows a small example of its data: I/Q traces, a compressed pulse, an estimated sway track, focusing arcs, complex speckle, a tone curve, and a real focused sonar image.

From raw I/Q and navigation to a focused seafloor image; each stage shown with its data at that point in the chain.

This is the chain ApertureLab simulates end to end, forward: a scene file becomes physics-accurate sonar time-series, and the same pipeline focuses it back into a labeled image. SAR and RGB became first-class modalities for AI once the data layer was cheap and abundant; modeling the whole chain is how sonar reaches that scale too.

Real aperture vs. synthetic

Side-scan sonar forms each image from a single physical array, so its along-track resolution spreads with range. Synthetic aperture sonar coherently combines hundreds of pings into one long virtual array, holding fine resolution across the entire swath. It is the same step that carried radar from real aperture to SAR. Read the walkthrough →

One simulated seafloor scene imaged two ways. Top row: real-aperture side-scan sonar at 400 kHz, where a cargo container, a cylinder, and a boat hull are smeared along-track. Bottom row: synthetic aperture sonar at 100 kHz with 32 channels, where the same three objects are sharply resolved with crisp edges and internal structure.

One simulated scene, rendered through both imaging chains. Top: real-aperture side-scan at 400 kHz. Bottom: synthetic aperture at 100 kHz, 32 channels, focused with GPU time-domain back-projection. The crops are matched 9 m windows on three objects; a cargo container, a cylinder, and a boat hull. The side-scan smears edges and internal structure along-track as range grows; the synthetic aperture resolves them at every range.

Both modalities fall out of the same physics engine, from the same scene file; a model can be trained, and cross-modality transfer studied, on either.

Rippled sandy seafloor with multiple objects and acoustic shadows

Rippled sandy seafloor; mines, pipeline sections, and debris with acoustic shadows

Complex wreck and pipeline scene

Complex wreck and pipeline scene; metallic returns and extended shadow geometry

Structural targets including pipes, containers, and equipment

Structural targets on rippled sand; pipes, containers, and caged equipment

Circular SAS 360-degree coverage

Circular SAS pass; 360-degree aspect coverage resolves all target facets simultaneously

Rocky seafloor with isolated target and ripple texture

Rocky seafloor with isolated target; high-relief scattering and long geometric shadow

Dense multi-object scene across mixed seafloor zones

Dense multi-object scene; rocks, mines, cylinders, and debris across mixed seafloor zones

Scene Design

Place objects, paint seafloor zones, configure sonar array geometry and platform motion in the visual editor. No scripting required.

Simulation

GPU ray-trace computes physics-based acoustic propagation ping by ping. Output is a complex IQ time series — 36 channels, 75 kHz sample rate — written in three industry-interchange formats: POSSM-compatible HDF5 (lossless complex IQ, the format used with real sonar hardware), XTF (eXtended Triton Format, the de-facto standard for survey-industry sonar tooling), and MSHDF (an open, publicly released sonar-data HDF5 interchange standard, so the output drops straight into existing sonar processing pipelines).

Beamforming

Time-Domain Back-Projection (TDBP) processes each ping of the raw time series, compensating for platform motion and applying matched filtering to produce a focused, georeferenced sonar image.

Output

SLC: Single Look Complex The full complex-valued beamformed image. Preserves phase; required for coherent change detection, micronavigation, and interferometry. The raw material for advanced processing.

DRC: Dynamic Range Compressed Tone-mapped magnitude image that renders fine seafloor texture and bright targets simultaneously visible. The format that humans and vision models interpret directly.

The scene on the left, defined entirely in the ApertureLab editor, is simulated and beamformed through the TDBP pipeline to produce the image on the right. No external data — physics all the way through.

ApertureLab Texture Editor: a node graph wiring ripple, multifractal, noise, and Voronoi generators into the seafloor output, with a live preview of the resulting rippled height field — Node-graph seafloor authoring with live height-field preview

Seafloor texture is authored the way Blender artists build materials: as a node graph. Perlin, multifractal, and Voronoi generators compose with calibrated ripple and roughness spectra to drive four channels at once; bathymetry, bottom-type maps, per-cell backscatter gain, and rock placement. Every graph is seeded and parametric, so one recipe regenerates into thousands of statistically distinct seafloors: the domain-randomization loop sim-to-real teams already run, pointed at the ocean floor. The preview evaluates any node's output at full simulation resolution.

Procedurally varied ocean scenes — wrecks, pipelines, debris fields, ripples, pockmarks — each generated end-to-end from a single YAML scene config and the ApertureLab beamforming pipeline. Click an image to open it full size, or use the arrows to step through.

ApertureLab's physics engine is grounded in peer-reviewed sonar science: GPU-accelerated TDBP beamforming, precision interpolation kernels for sub-wavelength micronavigation, and validated sub-bottom acoustic simulation.

AI & Machine Learning

Physics-accurate synthetic data is the missing ingredient for underwater AI; ApertureLab generates it. The research arc below runs from synthesizing scarce training data with a physics-coupled GAN, to few-shot classifiers trained on synthetic augmentation, to zero-shot VLMs that classify sonar imagery with only a text prompt.

The Full Research Arc

The cards above are highlights. The complete record is 29 publications across 14 years, and it spans the entire chain: physics-based simulation and synthetic data, micronavigation, GPU image formation, learned autofocus, compression, perceptual image quality, and the models that read the imagery. The diagram below places each paper on the stage of the processing chain it advances.

A diagram titled One body of work, every stage of the chain. An eight-stage SAS processing pipeline runs left to right: scene and scattering simulation, acquisition and pulse compression, motion estimation by micronavigation, image formation by back-projection, autofocus, post-processing, perception and complexity, and ATR, segmentation and detection. Below each stage hang publication cards with short titles, venues, and years, each connected by a line and dot to the stage it advances. Autofocus carries four papers and the final AI stage carries fifteen. A dashed arc over the chain reads: simulated data and physics priors feed the models.

29 publications, 2012 to 2026, mapped onto the SAS processing chain; each line ties a paper to the stage it advances.

Synthetic aperture sonar is to the ocean what satellite imagery is to the land; except the ocean floor remains almost entirely unmapped at the resolution needed for autonomous systems to act on it. The tools to change that are only now becoming mature enough to pair with modern AI.

ApertureLab was built to close that gap: a simulation environment where researchers can prototype sonar missions, generate training data for underwater object recognition models, and validate beamforming algorithms; without access to a ship.

Isaac D. Gerg, Ph.D.

AI Scientist, ClimateAI

If you're building AI systems that need to understand the physical world, I'd like to talk.

isaac.gerg@gergltd.com

More at gergltd.com · see the software in detail →