The modern media processing engine that gives you 100% of FFmpeg's power and performance — with a visual editor, bulletproof validation, live observability, pure-Go extensibility, and an intelligent real-time adaptive controller.
MediaMolder is a ground-up redesign of FFmpeg’s interface and orchestration layers. It uses the same battle-tested libav* libraries (libavcodec, libavformat, libavfilter, x264, x265, etc.) but replaces complex, error-prone command-line strings with graphs defined in JSON files.
MediaMolder Advantages
| Feature | MediaMolder | FFmpeg | GStreamer |
|---|---|---|---|
| Graph visualization | Drag-and-drop browser GUI | None | Manual Graphviz |
| Pre-run validation | Static + probe-assisted, one-click auto-fix | None | Limited |
| Live observability | Per-node metrics, Prometheus, OTEL, live terminal | None | Basic |
| Real-time adaptive controller | Full adaptive loop (threads + presets + frame drop + jitter buffers) | None | Limited |
| Extensibility | Pure Go Processor interface |
C filters (rebuild required) | C plugins |
| In-graph video editing | Multi-track timeline node — cuts, transitions, audio crossfades | Manual filter graphs | Separate GES library |
| Hardware acceleration | Probed, auto-mapped, safe across platforms | Opaque & error-prone | Complex |
| Declarative config | Versioned JSON + GUI round-trip | Command strings | Pipeline code |
| FFmpeg command migration | One-command convert-cmd with round-trip |
N/A | Manual |
| Production readiness | Pause/resume, real-time controller, embeddable | Good for scripts | Good for pipelines |
| Remote & distributed execution | Same JSON runs locally, on a remote GPU server, or across a fault-tolerant cluster | Separate tool required | Not built-in |
Every codec, filter, container, and hardware backend FFmpeg offers, with significantly better usability, safety, observability, and real-time reliability.
When a graph decodes and re-encodes video, MediaMolder follows FFmpeg's encoder
boundary behavior: source frame types are not reused as encoder commands.
Keyframes/IDRs are inserted by explicit graph controls such as
force_key_frames and processor-generated scene-change markers.
Real Time Encoding with Proper Metrics
MediaMolder GUI — Adaptive Bitrate x264 encode with real-time controller active
- Import any
ffmpegcommand line instantly - Drag filters, encoders, sources, and sinks onto a canvas
- Set parameter values in an inspector panel with extended help for each parameter
- Validate your job to catch problems before you run it
- MediaMolder can suggest and implement fixes to common problems
- Hover edges for full stream metadata
- Live performance metrics while running
- In real-time mode, the Real-time controller panel shows detailed statistics (threads, presets, frame drops)
- Export back to the equivalent FFmpeg command line
Real-Time Controller
Activate with --realtime (CLI) or global_options.realtime: true (JSON) and MediaMolder turns on an adaptive closed-loop control system that enables reliable live video encoding**.
- Adaptive control loop (500 ms ticks) continuously monitors the performance of every encoder
- Three-tier adaptation:
- Scale encoder threads (graceful restart, within CPU budget)
- Increments presets faster/slower (GOP-boundary switching, quality recovery when load drops) to optimize speed vs. quality
- Graceful frame drop (as a last resort)
- Configurable encoder input buffer (~4 s) + rolling output buffer (~4 s) absorbs upstream and downstream jitter (TCP stalls, HLS segment hiccups, SRT bursts, etc.)
- Live status badges in GUI +
mediamolder watch+ HTTP/SSE API - Perfect for live streaming, HLS/DASH playout, broadcast, and any long-running job that must stay on pace
Graphical User Interface / Visual editor
FFmpeg runs media processing graphs, but until now you would have to visualize
those graphs in your head. MediaMolder can import your FFmpeg command-line, and the GUI
enables you to view, edit, validate, and run your graph with detailed performance
metrics. The MediaMolder Graphical User Interface (GUI) is a fluid,
drag-and-drop graph editor that runs in your web browser. The GUI is launched
from the mediamolder binary by the gui subcommand. For details, see gui.md
-
Build encode graphs by dragging filters, encoders, sources, and sinks onto a canvas and wiring them by stream type. Mismatched types (video → audio input) are rejected at the handle level.
-
The Inspector displays typed forms for every node: encoder rate-control modes, HLS/DASH delivery wizards, bitstream-filter chains, chapter and container metadata editors, per-stream disposition and language overrides, audio channel routing.
-
When you select an input file, it is probed to determine its technical parameters.
-
Hover any edge (wire) to see every technical property MediaMolder can infer for that stream (resolution, pixel format, frame rate, colour space, codec, bitrate, sample rate, channel layout) — seeded from a probe of the source file and propagated forward through the graph.
-
FFmpeg -> parses any
ffmpegcommand line and drops the equivalent graph onto the canvas. -
-> FFmpeg shows you the equivalent FFmpeg command-line for your MediaMolder graph (warning if your graph contains MediaMolder-exclusive capabilities, like custom Go Processor Nodes).
-
The Run panel shows live per-node metrics — packets, rate, error count, mean frame latency, and unblocked performance (the rate each node achieves while actively processing, idle and stall time excluded).
-
MediaMolder graphs are saved as JSON files that can be run by passing the JSON to the MediaMolder binary as a single command-line argument.
-
MediaMolder saves the position of every node in your graph layout, and it saves the technical metadata of the source media if the source files are defined in the job.
-
The properties panel includes extended help for most parameters, explaining the effect of each option, the default value, and the valid range. Parameters that accept an enumerated list of values (e.g.
hwaccel) are controlled by a dropdown menu that lets you select a valid value.
Safe by default
MediaMolder validates your graph before the first frame is touched.
mediamolder validate (and the GUI's inline annotations) run a
static + probe-assisted analysis pass that catches every class of problem that
would cause FFmpeg to crash silently or produce unusable output hours into a
job: graph topology errors, codec/container incompatibilities, pixel-format
mismatches, hardware boundary violations, HDR without tone-mapping, interlaced
sources without a deinterlacer, VFR streams without an fps filter, odd
dimensions rejected by encoders, and more. Every issue is reported in a single
pass with a human-readable message, an ERROR/WARNING/INFO severity, and the
exact node and edge where the problem occurs.
Where the fix is unambiguous, the GUI offers one-click automated remediation
— auto-insert yadif/bwdif for interlaced sources, tonemap/zscale for
HDR→SDR conversions, fps/format/scale adapters at incompatible
boundaries, hwupload/hwdownload at hardware device transitions. You see the
problem and its fix before committing any compute time.
Observable at every level
MediaMolder was designed for long-running and production jobs where "check after it finishes" is not an option.
- Per-node performance tracking (
NodePerfTracker) records each node's active, idle, and stalled fractions, windowed FPS vs. target, stall count and duration, per-frame processing latency, and — for decoder nodes — the libavcodec thread pool fill (threads_busy). The bottleneck node and its constraint are always visible. - Prometheus metrics for every node and graph: 20+ gauges, counters, and histograms covering frames, errors, bitrate, frame latency, FPS, queue fill, CPU core estimates, and thread visibility.
/perfand/perf/streamHTTP endpoints expose the per-node snapshot as JSON on demand or as a 2 Hz Server-Sent Events stream for dashboards.mediamolder perfrenders a live colour-coded terminal table — green when nodes meet their FPS target, amber/red when they fall behind — with no extra tooling required.- OpenTelemetry span wiring: every graph run and every handler goroutine emits a child span so your existing distributed trace shows exactly where decode/filter/encode time goes.
Extensible in pure Go
Custom processing logic — object detection, AI filters, scene detection,
subtitle generation, business-specific metadata — slots into any graph as a
first-class node, written as an ordinary Go struct that implements the
processors.Processor interface. No C, no rebuilds, no filtergraph string
hacks. The engine schedules, monitors, and error-handles custom nodes
identically to built-in nodes. For more details, see go-processor-nodes.md.
Processors may also implement the optional FrameSource sub-interface to generate their own frames rather than processing inbound ones — the built-in sequence_editor timeline node is one example (see Video editing built in).
You can add a custom Yolo-v8 object-detection node to a graph and it will run directly inside your media graph. See Yolo-V8 Guide
For multimodal scene understanding — captions, temporal grounding, edit plans, QA — the built-in vidi_analyzer node connects any graph to a Vidi 2.5 inference service. See Vidi 2.5 Guide
For cloud-hosted video understanding — index, search, caption, and embed clips via the TwelveLabs Marengo and Pegasus models — the twelvelabs_indexer, twelvelabs_analyzer, twelvelabs_searcher, and twelvelabs_embedder nodes are built in. See TwelveLabs Guide
For local, offline speech-to-text, the built-in whisper_stt node transcribes an audio stream to timestamped subtitles (SRT/VTT/JSON/TXT) with whisper.cpp. See Whisper Speech-to-Text Guide
For true camera-RAW develop (NEF/CR2/CR3/ARW/RAF/ORF/RW2/PEF/SRW/DNG) to a full, demosaicked 8-bit sRGB image via LibRaw — not the camera's embedded JPEG preview, and not libav's black RAW render — use the mediamolder raw-decode command or the built-in raw_decode node inside a graph. Deterministic develop (camera white balance, sRGB, AHD); LibRaw is bundled from pinned source and linked statically. See Camera-RAW Decode Guide
For native face analysis — detect faces (YOLOv8-face), align each, and optionally embed them (SFace) for recognition/clustering — use the mediamolder face-detect command for images/video or the built-in face_detect node inside a graph. Reproducible embeddings; models loaded as data, never linked. See Face Detection Guide
Video editing built in
Assemble clips into a finished video — cuts, trims, wipes, dissolves, layering,
and audio crossfades — entirely inside a job graph, with no external NLE. The
built-in sequence_editor node is a multi-track timeline that opens its own
sources and emits a finished stream: place clips on tracks at explicit times, add
transitions composited by a native Go engine
(the full xfade set), and mix audio from the same clips with crossfades
auto-coupled to each cut. Build it as a spreadsheet-style table in the GUI or as
declarative JSON. See the Video Editing Guide.
Hardware acceleration — any platform, properly
MediaMolder makes hardware acceleration safe and understandable. See hardware-acceleration.md
- A Hardware Capabilities dialog probes all available backends at startup and displays each GPU's marketing name, supported encode/decode codecs grouped by media type, capability notes (max resolution, 10-bit, B-frames, concurrent session limits), and a diagnostic message for any backend that failed to open.
- Per-input, per-stream hardware decode control with a live scope hint in the Inspector: "HW decode: video (prores_ap4x) · SW fallback: audio" — so you know exactly what goes to the GPU before you run.
- Automatic hardware filter mapping: assign a CUDA device to a
scalenode, tick Auto-map to hardware filter, and the runtime promotes it toscale_cudaand insertshwupload/hwdownloadat device boundaries. - Apple ProRes RAW hardware decode via VideoToolbox — including ProRes RAW HQ and ProRes 4444 XQ — codecs that FFmpeg's VideoToolbox binding does not expose.
Production-grade infrastructure
- Declarative, version-controlled graphs. JSON files are diffable, database-storable, reliably generated programmatically, and fully schema- validated (v1.0/v1.1). The graph layout (node positions) round-trips through the GUI without polluting the runtime config.
- Full timing control.
-ss/-t/-toat input and output scope, a faithful Go port of FFmpeg's demuxer trim logic,av_parse_timestring parsing, and per-encoder time-base control. - Graph state machine with live pause/resume, graceful cancellation via
context.Context, per-node error policies, and a structured event bus. Suitable for live streams and unattended overnight jobs alike. - Trivially embeddable. The CLI and GUI are thin consumers of a clean Go API. Drop the engine into any service or CI/CD graph with a single import.
Remote and distributed execution
MediaMolder is built from the ground up to run jobs anywhere — on your laptop, on a single remote GPU box, or across a horizontally-scaled cluster — without changing a line of your graph JSON.
Three deployment tiers, one binary, one job format:
| Tier | How to start | Best for |
|---|---|---|
| Local | mediamolder run job.json |
Development, single-machine encodes |
| Tier 1 — remote server | mediamolder serve --mode=server |
Run jobs on a more powerful machine; GUI/CLI stays on your laptop |
| Tier 2 — distributed cluster | --mode=api + --mode=worker |
Scene-parallel encoding, fault tolerance, cloud autoscaling |
Why this matters for production workloads:
- Zero job-format migration. The same JSON config file that runs locally
is submitted verbatim to a remote server or cluster. The
Distributionblock for Tier 2 fan-out is purely additive — existing jobs run unmodified. - Fault-tolerant task execution. In Tier 2, every encode task is retried
automatically up to a configurable
max_attemptsbefore being moved to an inspectable dead-letter queue. A crashed worker never loses a job. - Scene-parallel encoding. The
fanout_dynamicstrategy splits a source into segments at scene boundaries, encodes them in parallel across workers, then stitches the outputs with thegatherstrategy — dramatically reducing wall-clock time for long-form content without changing the output. - Capability-aware routing. Workers advertise their hardware (
cuda,h264_nvenc,hevc_nvenc, …) and region. Tasks declare what they require. GPU tasks are automatically routed to GPU workers; CPU-only workers pick up everything else. No manual queue partitioning needed. - S3 presigning at the server boundary. Submit jobs with
s3://URIs; the server converts them to short-lived HTTPS presigned URLs before execution starts. Workers never hold AWS credentials, and credentials never travel in job JSON. - Local file upload. Enable
--enable-uploadsto let GUI users and CLI scripts push local files directly to the server before submitting the job. Files are stored in a scoped--workdirand deleted on job completion. - Pluggable state and queue backends. Run in-memory + SQLite for local development; swap to Postgres + NATS JetStream or DynamoDB + SQS for multi-instance cloud deployments with no code changes.
- Strong authentication built in. Choose a static bearer token
(
--auth-token-file), OIDC JWTs from any standards-compliant provider (--oidc-issuer), or mTLS client certificates (--mtls-ca). All three compose freely;/healthzand/readyzremain unauthenticated for load-balancer probes. - Distributed tracing end-to-end. Tier 2 propagates OpenTelemetry span context through the queue automatically — task spans appear as children of the originating job span in your trace backend, giving full visibility from HTTP request to encoded frame.
- GUI-first remote workflow. Click Backend in the toolbar, enter a server URL and token, and every subsequent Run is sent to the remote machine. Switch back to local with one click.
For setup instructions see Remote Backend Guide.
Drop-in FFmpeg migration
mediamolder convert-cmd turns any FFmpeg command line into a validated JSON
config in one step: rate-control flags, per-stream maps, stream-copy nodes,
tee/HLS/DASH muxers, bitstream filters, hardware devices, cover-art and
attachment handling, -map_metadata/-map_chapters, two-pass encoding, and
more — all converted with high fidelity and covered by round-trip regression
tests. The generated graph runs immediately; the Inspector shows every option
the conversion inferred so you can review and adjust.
See FFmpeg Migration Guide
MediaMolder gives you 100% of FFmpeg's media processing capabilities — every codec, filter, hardware backend, and container format — with a graph model that validates before it runs, shows you what's happening while it runs, and tells you exactly what went wrong when it doesn't.
Prerequisites
- Go 1.23+
- FFmpeg 8.1+ (libavcodec 62.x, libavformat 62.x, libavfilter 11.x, libavutil 60.x)
- Either a system install (via Homebrew, apt, etc.) with
pkg-configavailable, or a source build in a sibling directory (see static build below)
- Either a system install (via Homebrew, apt, etc.) with
- pkg-config (if using system FFmpeg)
- Git LFS (for the media test corpus, when available):
git lfs install
Build / Install
For detailed instructions see MacOS, Windows and Linux
Documentation
Usage
- Using MediaMolder (CLI)
- Visual Editor (GUI)
- Remote Backend Guide — run your jobs on a single remote server or a distributed cluster with API + workers
- Concepts — Graph Model, Nodes, Edges, Lifecycle
- JSON Config Reference
- FFmpeg Migration Guide
- Export to FFmpeg CLI
- Validation
- Video Editing Guide — assemble clips into a finished video: multi-track timelines, cuts/trims, dissolves and the full transition set with audio crossfades (
sequence_editor) - Go Processor Nodes —
Processorinterface,FrameSourceinterface, built-in processors (sequence_editor, scene detectors, TwelveLabs, Vidi, …), writing custom nodes - Scene Detection — seven detectors: go-scene-detect ports (content, adaptive, threshold, hash, histogram), FFmpeg scdet, and the motion-compensated
scene_change_mc(frame-accurate dissolve/fade detection) - Yolov8 object detection/classification
- Vidi 2.5 multimodal analysis
- TwelveLabs video understanding
- Whisper speech-to-text — local, offline transcription to SRT/VTT/JSON/TXT (
whisper_stt) - Camera-RAW Decode — develop NEF/CR2/CR3/ARW/RAF/ORF/RW2/PEF/SRW/DNG to 8-bit sRGB via bundled LibRaw (
raw-decodeCLI +raw_decodenode) - Face Detection — detect, align, and embed faces in images/video for recognition/clustering (
face-detectCLI +face_detectnode) - Real-Time Controller — adaptive control loop, encoder preset stepping, output buffers,
mediamolder watch, HTTP API
Code
- Architecture
- Graph State Machine
- Graph Instrumentation Roadmap
- Clock & Sync
- Event Bus
- Error Handling
- Hardware Acceleration
- Observability — Prometheus metrics, OpenTelemetry tracing, per-node performance monitoring,
mediamolder perfCLI - Graph Compilation
- Video Transitions — native Go transition engine (wipes, slides, fades, circles, …) that replaces libavfilter
xfade - Camera-RAW Decode — LibRaw develop boundary, the three decode intents, determinism scoping, static-link rationale
Project
- MediaMolder Project
- Contribution & Governance
- Project Specification
- Benchmarks —
mediamolder hwbenchuser tool + Go graph CI benchmarks - Licensing