GitHub - MediaMolder/mediamolder: A media processing framework written in Go.

14 min read Original article ↗

The modern media processing engine that gives you 100% of FFmpeg's power and performance — with a visual editor, bulletproof validation, live observability, pure-Go extensibility, and an intelligent real-time adaptive controller.

Go License FFmpeg Powered

MediaMolder is a ground-up redesign of FFmpeg’s interface and orchestration layers. It uses the same battle-tested libav* libraries (libavcodec, libavformat, libavfilter, x264, x265, etc.) but replaces complex, error-prone command-line strings with graphs defined in JSON files.


MediaMolder Advantages

Feature MediaMolder FFmpeg GStreamer
Graph visualization Drag-and-drop browser GUI None Manual Graphviz
Pre-run validation Static + probe-assisted, one-click auto-fix None Limited
Live observability Per-node metrics, Prometheus, OTEL, live terminal None Basic
Real-time adaptive controller Full adaptive loop (threads + presets + frame drop + jitter buffers) None Limited
Extensibility Pure Go Processor interface C filters (rebuild required) C plugins
In-graph video editing Multi-track timeline node — cuts, transitions, audio crossfades Manual filter graphs Separate GES library
Hardware acceleration Probed, auto-mapped, safe across platforms Opaque & error-prone Complex
Declarative config Versioned JSON + GUI round-trip Command strings Pipeline code
FFmpeg command migration One-command convert-cmd with round-trip N/A Manual
Production readiness Pause/resume, real-time controller, embeddable Good for scripts Good for pipelines
Remote & distributed execution Same JSON runs locally, on a remote GPU server, or across a fault-tolerant cluster Separate tool required Not built-in

Every codec, filter, container, and hardware backend FFmpeg offers, with significantly better usability, safety, observability, and real-time reliability.

When a graph decodes and re-encodes video, MediaMolder follows FFmpeg's encoder boundary behavior: source frame types are not reused as encoder commands. Keyframes/IDRs are inserted by explicit graph controls such as force_key_frames and processor-generated scene-change markers.

Real Time Encoding with Proper Metrics

MediaMolder GUI — Adaptive Bitrate x264 encode with real-time controller activeVideo Encoding Real Time Controller

  • Import any ffmpeg command line instantly
  • Drag filters, encoders, sources, and sinks onto a canvas
  • Set parameter values in an inspector panel with extended help for each parameter
  • Validate your job to catch problems before you run it
    • MediaMolder can suggest and implement fixes to common problems
  • Hover edges for full stream metadata
  • Live performance metrics while running
  • In real-time mode, the Real-time controller panel shows detailed statistics (threads, presets, frame drops)
  • Export back to the equivalent FFmpeg command line

Real-Time Controller

Activate with --realtime (CLI) or global_options.realtime: true (JSON) and MediaMolder turns on an adaptive closed-loop control system that enables reliable live video encoding**.

  • Adaptive control loop (500 ms ticks) continuously monitors the performance of every encoder
  • Three-tier adaptation:
    1. Scale encoder threads (graceful restart, within CPU budget)
    2. Increments presets faster/slower (GOP-boundary switching, quality recovery when load drops) to optimize speed vs. quality
    3. Graceful frame drop (as a last resort)
  • Configurable encoder input buffer (~4 s) + rolling output buffer (~4 s) absorbs upstream and downstream jitter (TCP stalls, HLS segment hiccups, SRT bursts, etc.)
  • Live status badges in GUI + mediamolder watch + HTTP/SSE API
  • Perfect for live streaming, HLS/DASH playout, broadcast, and any long-running job that must stay on pace

Graphical User Interface / Visual editor

FFmpeg runs media processing graphs, but until now you would have to visualize those graphs in your head. MediaMolder can import your FFmpeg command-line, and the GUI enables you to view, edit, validate, and run your graph with detailed performance metrics. The MediaMolder Graphical User Interface (GUI) is a fluid, drag-and-drop graph editor that runs in your web browser. The GUI is launched from the mediamolder binary by the gui subcommand. For details, see gui.md

  • Build encode graphs by dragging filters, encoders, sources, and sinks onto a canvas and wiring them by stream type. Mismatched types (video → audio input) are rejected at the handle level.

  • The Inspector displays typed forms for every node: encoder rate-control modes, HLS/DASH delivery wizards, bitstream-filter chains, chapter and container metadata editors, per-stream disposition and language overrides, audio channel routing.

  • When you select an input file, it is probed to determine its technical parameters.

  • Hover any edge (wire) to see every technical property MediaMolder can infer for that stream (resolution, pixel format, frame rate, colour space, codec, bitrate, sample rate, channel layout) — seeded from a probe of the source file and propagated forward through the graph.

  • FFmpeg -> parses any ffmpeg command line and drops the equivalent graph onto the canvas.

  • -> FFmpeg shows you the equivalent FFmpeg command-line for your MediaMolder graph (warning if your graph contains MediaMolder-exclusive capabilities, like custom Go Processor Nodes).

  • The Run panel shows live per-node metrics — packets, rate, error count, mean frame latency, and unblocked performance (the rate each node achieves while actively processing, idle and stall time excluded).

  • MediaMolder graphs are saved as JSON files that can be run by passing the JSON to the MediaMolder binary as a single command-line argument.

  • MediaMolder saves the position of every node in your graph layout, and it saves the technical metadata of the source media if the source files are defined in the job.

  • The properties panel includes extended help for most parameters, explaining the effect of each option, the default value, and the valid range. Parameters that accept an enumerated list of values (e.g. hwaccel) are controlled by a dropdown menu that lets you select a valid value.

Safe by default

MediaMolder validates your graph before the first frame is touched.

mediamolder validate (and the GUI's inline annotations) run a static + probe-assisted analysis pass that catches every class of problem that would cause FFmpeg to crash silently or produce unusable output hours into a job: graph topology errors, codec/container incompatibilities, pixel-format mismatches, hardware boundary violations, HDR without tone-mapping, interlaced sources without a deinterlacer, VFR streams without an fps filter, odd dimensions rejected by encoders, and more. Every issue is reported in a single pass with a human-readable message, an ERROR/WARNING/INFO severity, and the exact node and edge where the problem occurs.

Where the fix is unambiguous, the GUI offers one-click automated remediation — auto-insert yadif/bwdif for interlaced sources, tonemap/zscale for HDR→SDR conversions, fps/format/scale adapters at incompatible boundaries, hwupload/hwdownload at hardware device transitions. You see the problem and its fix before committing any compute time.

Observable at every level

MediaMolder was designed for long-running and production jobs where "check after it finishes" is not an option.

  • Per-node performance tracking (NodePerfTracker) records each node's active, idle, and stalled fractions, windowed FPS vs. target, stall count and duration, per-frame processing latency, and — for decoder nodes — the libavcodec thread pool fill (threads_busy). The bottleneck node and its constraint are always visible.
  • Prometheus metrics for every node and graph: 20+ gauges, counters, and histograms covering frames, errors, bitrate, frame latency, FPS, queue fill, CPU core estimates, and thread visibility.
  • /perf and /perf/stream HTTP endpoints expose the per-node snapshot as JSON on demand or as a 2 Hz Server-Sent Events stream for dashboards.
  • mediamolder perf renders a live colour-coded terminal table — green when nodes meet their FPS target, amber/red when they fall behind — with no extra tooling required.
  • OpenTelemetry span wiring: every graph run and every handler goroutine emits a child span so your existing distributed trace shows exactly where decode/filter/encode time goes.

Extensible in pure Go

Custom processing logic — object detection, AI filters, scene detection, subtitle generation, business-specific metadata — slots into any graph as a first-class node, written as an ordinary Go struct that implements the processors.Processor interface. No C, no rebuilds, no filtergraph string hacks. The engine schedules, monitors, and error-handles custom nodes identically to built-in nodes. For more details, see go-processor-nodes.md.

Processors may also implement the optional FrameSource sub-interface to generate their own frames rather than processing inbound ones — the built-in sequence_editor timeline node is one example (see Video editing built in).

You can add a custom Yolo-v8 object-detection node to a graph and it will run directly inside your media graph. See Yolo-V8 Guide

For multimodal scene understanding — captions, temporal grounding, edit plans, QA — the built-in vidi_analyzer node connects any graph to a Vidi 2.5 inference service. See Vidi 2.5 Guide

For cloud-hosted video understanding — index, search, caption, and embed clips via the TwelveLabs Marengo and Pegasus models — the twelvelabs_indexer, twelvelabs_analyzer, twelvelabs_searcher, and twelvelabs_embedder nodes are built in. See TwelveLabs Guide

For local, offline speech-to-text, the built-in whisper_stt node transcribes an audio stream to timestamped subtitles (SRT/VTT/JSON/TXT) with whisper.cpp. See Whisper Speech-to-Text Guide

For true camera-RAW develop (NEF/CR2/CR3/ARW/RAF/ORF/RW2/PEF/SRW/DNG) to a full, demosaicked 8-bit sRGB image via LibRaw — not the camera's embedded JPEG preview, and not libav's black RAW render — use the mediamolder raw-decode command or the built-in raw_decode node inside a graph. Deterministic develop (camera white balance, sRGB, AHD); LibRaw is bundled from pinned source and linked statically. See Camera-RAW Decode Guide

For native face analysis — detect faces (YOLOv8-face), align each, and optionally embed them (SFace) for recognition/clustering — use the mediamolder face-detect command for images/video or the built-in face_detect node inside a graph. Reproducible embeddings; models loaded as data, never linked. See Face Detection Guide

Video editing built in

Assemble clips into a finished video — cuts, trims, wipes, dissolves, layering, and audio crossfades — entirely inside a job graph, with no external NLE. The built-in sequence_editor node is a multi-track timeline that opens its own sources and emits a finished stream: place clips on tracks at explicit times, add transitions composited by a native Go engine (the full xfade set), and mix audio from the same clips with crossfades auto-coupled to each cut. Build it as a spreadsheet-style table in the GUI or as declarative JSON. See the Video Editing Guide.

Hardware acceleration — any platform, properly

MediaMolder makes hardware acceleration safe and understandable. See hardware-acceleration.md

  • A Hardware Capabilities dialog probes all available backends at startup and displays each GPU's marketing name, supported encode/decode codecs grouped by media type, capability notes (max resolution, 10-bit, B-frames, concurrent session limits), and a diagnostic message for any backend that failed to open.
  • Per-input, per-stream hardware decode control with a live scope hint in the Inspector: "HW decode: video (prores_ap4x) · SW fallback: audio" — so you know exactly what goes to the GPU before you run.
  • Automatic hardware filter mapping: assign a CUDA device to a scale node, tick Auto-map to hardware filter, and the runtime promotes it to scale_cuda and inserts hwupload/hwdownload at device boundaries.
  • Apple ProRes RAW hardware decode via VideoToolbox — including ProRes RAW HQ and ProRes 4444 XQ — codecs that FFmpeg's VideoToolbox binding does not expose.

Production-grade infrastructure

  • Declarative, version-controlled graphs. JSON files are diffable, database-storable, reliably generated programmatically, and fully schema- validated (v1.0/v1.1). The graph layout (node positions) round-trips through the GUI without polluting the runtime config.
  • Full timing control. -ss/-t/-to at input and output scope, a faithful Go port of FFmpeg's demuxer trim logic, av_parse_time string parsing, and per-encoder time-base control.
  • Graph state machine with live pause/resume, graceful cancellation via context.Context, per-node error policies, and a structured event bus. Suitable for live streams and unattended overnight jobs alike.
  • Trivially embeddable. The CLI and GUI are thin consumers of a clean Go API. Drop the engine into any service or CI/CD graph with a single import.

Remote and distributed execution

MediaMolder is built from the ground up to run jobs anywhere — on your laptop, on a single remote GPU box, or across a horizontally-scaled cluster — without changing a line of your graph JSON.

Three deployment tiers, one binary, one job format:

Tier How to start Best for
Local mediamolder run job.json Development, single-machine encodes
Tier 1 — remote server mediamolder serve --mode=server Run jobs on a more powerful machine; GUI/CLI stays on your laptop
Tier 2 — distributed cluster --mode=api + --mode=worker Scene-parallel encoding, fault tolerance, cloud autoscaling

Why this matters for production workloads:

  • Zero job-format migration. The same JSON config file that runs locally is submitted verbatim to a remote server or cluster. The Distribution block for Tier 2 fan-out is purely additive — existing jobs run unmodified.
  • Fault-tolerant task execution. In Tier 2, every encode task is retried automatically up to a configurable max_attempts before being moved to an inspectable dead-letter queue. A crashed worker never loses a job.
  • Scene-parallel encoding. The fanout_dynamic strategy splits a source into segments at scene boundaries, encodes them in parallel across workers, then stitches the outputs with the gather strategy — dramatically reducing wall-clock time for long-form content without changing the output.
  • Capability-aware routing. Workers advertise their hardware (cuda, h264_nvenc, hevc_nvenc, …) and region. Tasks declare what they require. GPU tasks are automatically routed to GPU workers; CPU-only workers pick up everything else. No manual queue partitioning needed.
  • S3 presigning at the server boundary. Submit jobs with s3:// URIs; the server converts them to short-lived HTTPS presigned URLs before execution starts. Workers never hold AWS credentials, and credentials never travel in job JSON.
  • Local file upload. Enable --enable-uploads to let GUI users and CLI scripts push local files directly to the server before submitting the job. Files are stored in a scoped --workdir and deleted on job completion.
  • Pluggable state and queue backends. Run in-memory + SQLite for local development; swap to Postgres + NATS JetStream or DynamoDB + SQS for multi-instance cloud deployments with no code changes.
  • Strong authentication built in. Choose a static bearer token (--auth-token-file), OIDC JWTs from any standards-compliant provider (--oidc-issuer), or mTLS client certificates (--mtls-ca). All three compose freely; /healthz and /readyz remain unauthenticated for load-balancer probes.
  • Distributed tracing end-to-end. Tier 2 propagates OpenTelemetry span context through the queue automatically — task spans appear as children of the originating job span in your trace backend, giving full visibility from HTTP request to encoded frame.
  • GUI-first remote workflow. Click Backend in the toolbar, enter a server URL and token, and every subsequent Run is sent to the remote machine. Switch back to local with one click.

For setup instructions see Remote Backend Guide.

Drop-in FFmpeg migration

mediamolder convert-cmd turns any FFmpeg command line into a validated JSON config in one step: rate-control flags, per-stream maps, stream-copy nodes, tee/HLS/DASH muxers, bitstream filters, hardware devices, cover-art and attachment handling, -map_metadata/-map_chapters, two-pass encoding, and more — all converted with high fidelity and covered by round-trip regression tests. The generated graph runs immediately; the Inspector shows every option the conversion inferred so you can review and adjust. See FFmpeg Migration Guide


MediaMolder gives you 100% of FFmpeg's media processing capabilities — every codec, filter, hardware backend, and container format — with a graph model that validates before it runs, shows you what's happening while it runs, and tells you exactly what went wrong when it doesn't.

Prerequisites

  • Go 1.23+
  • FFmpeg 8.1+ (libavcodec 62.x, libavformat 62.x, libavfilter 11.x, libavutil 60.x)
    • Either a system install (via Homebrew, apt, etc.) with pkg-config available, or a source build in a sibling directory (see static build below)
  • pkg-config (if using system FFmpeg)
  • Git LFS (for the media test corpus, when available): git lfs install

Build / Install

See Build & Packaging

For detailed instructions see MacOS, Windows and Linux

Documentation

Usage

Code

Project

Star History

Star History Chart