ZxClip | Edit audio like text on macOS

4 min read Original article ↗

Everything you need

Record, transcribe, edit, and export audio and video — without leaving the app.

Core workflow

Edit your words,
or your waveform

Transcribe any recording and edit the text like a document. Words you delete are automatically removed from the audio — no timeline scrubbing needed.

ZxClip transcript editor showing editable text synced with audio

ZxClip detecting and highlighting filler words in the transcript

Detect filler words

Kill the "uhm" and "you know" in seconds

The app detects filler words and speech disfluencies in your transcript and marks them for removal — all running on your device, with no data sent anywhere.

Smart silence detection

Dead air and gaps,
gone in one tap

Automatically detect silent gaps and moments where no one's speaking. Review what's flagged and clear it all at once — no manual scrubbing.

ZxClip detecting silence and non-speech segments in the timeline

ZxClip audio patching using voice cloning to fix a word

Voice cloning

Fix a word without
re-recording

Patch any word or phrase using on-device voice cloning. Type the correction and the app generates the audio in your voice — seamlessly spliced in.

Frictionless edits

Edit fast with ripple edits

Every cut ripples through the timeline automatically. The transcript, waveform, and video track stay perfectly aligned after every edit.

ZxClip timeline view showing ripple edits keeping clips aligned after a cut

ZxClip screen recording interface with auto-zoom

Built-in recording

Record your screen,
then edit it

Capture your screen with smart auto-zoom, then jump straight into transcript-based editing — no third-party recorder needed.

Features

Everything in the app right now

Transcribe audio

Turn recordings into editable text quickly so you can find and fix parts faster.

Edit audio like text

Simply edit text to generate audio. No re-recording required.

On-device AI models

Powered by AI models that run directly on your device for privacy and speed.

Detect filler words

Detect filler words like "uhm", and other speech disfluencies like "you know".

Silence & dead air detection

Automatically spot silent gaps and moments where no one's speaking — clean up pacing in minutes.

Speech-gap detection

Find segments where no one's talking — background audio, pauses, ambient noise — and clear them in one go.

macOS support

Built for Apple Silicon on macOS 15.2 and above.

Export audio and video

Deliver polished outputs once edits are done, without extra tools.

Screen recording

Capture your screen with smart auto-zoom, then jump straight into transcript-based editing.

Upcoming features

Coming soon

Speech enhancement

Improve speech clarity and remove noise from audio.

Coming soon

More AI models

Expanded model support is on the way, including Pocket TTS and Nvidia Parakeet.

Coming soon

MCP

Connect to AI agents to automate editing your media

Coming soon

Microsoft Windows support

Use the same workflow beyond macOS when the Windows app is available.

Coming soon

YouTube chapter generation

Auto-generate clean chapter markers from your transcript for faster publishing.

Coming soon

Captions

Create and export timed captions from edited transcripts in a few clicks.

Coming soon

Export FCP XML

Generate Final Cut Pro XML exports for timeline handoff.

Coming soon

Backgrounds

Change or remove backgrounds

Get a license

One-time purchase. Yours forever.

Unlock exports and full editing features with a one-time license that supports 2 devices on macOS (Apple Silicon).

FAQ

Behind the build

How I built this

A quick breakdown of what it took to build ZxClip.

HashNuke profile picture

I built ZxClip for myself, to edit videos fast. I've had fun building this. I hope you love using it. Below are some quick info about how this app was built.

Chart showing ZxClip editing metrics
  • Tauri with Rust & Swift, alongwith Stimulus.js (yes!) and Basecoat for frontend
  • Ported 4 models to Swift using Apple MLX (for MacOS). Whisper, Chatterbox TTS, Wav2Vec2 and Gemma3. This implementation of Whisper comes with some customizations curated from various other implementations.
  • Ported 7 models to Rust with Burn framework for Windows version of the app. Whisper, Gemma3, Bert, Wav2Vec2, Vocos, F5 TTS and soundchoice g2p.
  • Fine-tuned BERT for a new disfluency classification model. This model is used in the app to detect filler sounds and words. Model & dataset will be on HuggingFace soon
  • Fully built with AI, and a few months of shepherding coding agents.