Demo (48s)
Prefer a direct file? Raw MP4
How it works
- You speak natural language instructions anywhere in the clip (wake phrase:
“Hey Cleo”). - AlgoMommy extracts audio locally and transcribes locally.
- It sends only short text snippets (typically up to ~30 seconds of speech) for segments that appear to address Cleo, plus a list of destination folder paths under the root you choose.
- The service decides where the clip should be copied and what tags/metadata to apply.
Why “Cleo”
- WhisperKit is accurate but too slow to transcribe an entire long clip end-to-end.
- Apple SpeechAnalyzer is much faster but less accurate.
- AlgoMommy uses SpeechAnalyzer to quickly locate likely “Cleo-addressed” segments, then re-transcribes just those segments with WhisperKit for accuracy.