Show HN: MP3 to Text
veed.io"MP3 to Text" seems very inaccurate since you can only upload video files. In fact uploading an .mp3 file shows "File type not supported".
edit: I get it, OP just keeps submitting his service with different descriptions until one gets some upvotes. Only took 25 tries to get 30 points. Shameful.
Just goes to show that people upvote anything if it sounds cool, and don't bother checking it out.
We add new major product features every few week and think its important to see if people like it!
MP3 to text? Why does it ask me to upload a video?
Opps, thats a UX problem. Will get this fixed now.
Very cool but how do I know what languages supported? It says "VEED is able to recognise and transcribe languages from all over the world - English, Spanish, French, Chinese, and many more".
From my experience with NLP/AST the tricky part is models for some less common languages.
This is true, we support over 55 languages. The more popular the language the better the results.
What’s the pricing? What speech-to-text engine is being used?
Clicking on the Sign Up button on iOS Safari does nothing.
Clicking on the Get Started button takes me to an Upload Video form - not what I expected from a mp3-to-text service.
Apparently you're limited to 50 MB for free, which is pretty short if you can't send audio files but only videos.
This would have been genius in the napster days of yore; why seek out and download mp3s yourself? Just sit back and have people send stuff to you! I kid, i kid! ;-)
Is there even a good offline version of this? There are some opensource tools for speed-to-text but what about batch processing of audio files?
You may be interested in voice2json for offline batch processing: https://voice2json.org
Here's an example using GNU parallel: http://voice2json.org/recipes.html#parallel-wav-recognition
Wow this is exactly what I had in mind for "opensource tools for speed-to-text". I didn't know it did this too. Thanks a lot!
> voice2json is optimized for:
> Sets of voice commands that are described well by a grammar
> Commands with uncommon words or pronunciations
> Commands or intents that can vary at runtime
Doesn't sound like what you'd want for a generic transcription service.
It supports open-ended transcription too: https://voice2json.org/commands.html#open-transcription
Users have reported good accuracy with the English Deepspeech profile: https://github.com/synesthesiam/voice2json-profiles
How does this work? And is it more accurate than YouTube's automatic captions?
Multi speaker?
reminds me of descript