Settings

Theme

Show HN: Audino – Open-Source Audio and Speech Annotation Tool

github.com

123 points by manrajsingh 6 years ago · 41 comments

Reader

jcims 6 years ago

You're likely to find people using this to build solutions for remote depositions (in the US). Seems to be something fairly ripe for disruption, the pandemic is exacerbating the demand.

Is there a recorded demo of it somewhere? Would be nice to see it in action as I'm having a little trouble understanding the workflow.

  • manrajsinghOP 6 years ago

    We're working on a recorded demo. For now, the tutorial section explains the workflow well (with screenshots).

nayuki 6 years ago

Nice choice of Pokémon. https://bulbapedia.bulbagarden.net/wiki/Audino_(Pok%C3%A9mon...

  • manrajsinghOP 6 years ago

    To be honest, it's a happy coincidence. The name is an amalgamation of audio and annotation.

  • ngngngng 6 years ago

    Audino (Japanese: タブンネ Tabunne) is a Normal-type Pokémon introduced in Generation V.

    While it is not known to evolve into or from any other Pokémon, Audino can Mega Evolve into Mega Audino using the Audinite.

wgerard 6 years ago

This is incredible!

While we were trying to build up a corpus of transcription data for our company, I often thought we should build something like this an open-source it. We ended up building a one-off hacked-up thing to do it instead, but I'm really glad this exists for any future people in our shoes.

Annotating speech data is super tedious and anything that improves the process even 10% is a huge, huge win.

  • manrajsinghOP 6 years ago

    Even we used to do the same. Hence, we developed this tool to mitigate a lot of the pain points that previous tools brought. Thanks for sharing your experience and would love to hear your experience with this tool.

lostgame 6 years ago

The name is incredibly easy to confuse with Arduino.

Maybe even ‘audinote’ would not only be less confusing; but also more clear on what the app does?

blipmusic 6 years ago

How does this compare to ELAN (https://archive.mpi.nl/tla/elan) in regards to doing the actual annotations/transcriptions? Or could ELAN/EAF-files perhaps be considered for input formats in future releases?

tmaly 6 years ago

What are some example usages of this?

I could not really tell right away looking at the docs. Why would I want to use this?

  • manrajsinghOP 6 years ago

    At our lab, we extensively work on problems that involve speech data. This includes tasks like speech recognition, speech scoring, emotion recognition, topic detection and speaker diarisation. Some of these tasks have public data available, while tasks like speech scoring and low-resource speech recognition, the data is fairly limited for supervised learning. Hence, we developed this annotation tool to generate corpus for our need.

    • mkagenius 6 years ago

      In case still not clear, it does not do the transcription, it does not. Oh Hi Mark. It asks you to manually annotate it (in case you want to prepare a training data set for your algorithm), its not an AI algorithm.

      • jtbayly 6 years ago

        This is the most helpful comment here. I still don’t understand what the tool is for though. Up until now I assumed it would allow me to get automatic transcriptions, including breaking them down by speaker.

        • fluential 6 years ago

          I was looking into that space recently and I have used otter.ai for transcriptions which gives you 6000 minutes/month for 8 USD, which is insanely cheap in that space. Their British language model is quite good as well.

          I’ve bulk exported generated srt/vtt files from my fav podcasts and using tinysearch that was posted here recently with ableplayer to provide audio full text search of my Jekyll published podcasts posts and with clickable timestamps to audio play of search phrases.

          Whenever I want to know what podcaster has to say on specific subject a quick search makes such a difference!

          • jtbayly 6 years ago

            Awesome. Thanks for the info. I look forward to trying out your suggestion.

    • rock_artist 6 years ago

      So this tool is mostly a way to store your dataset?

      Eg. doing things like force alignment should be done in other tools and use the api to put in the dataset?

jononor 6 years ago

Is there an API for getting the data to annotate into that app, and for getting the annotations out?

Most of the time I need the labels in my own system, and don't want to manually move data back and forth.

TACIXAT 6 years ago

If I had a subtitle file to use as best guesses for sentence segmentation, could this help extract clips and clean up start and end alignment?

  • manrajsinghOP 6 years ago

    Interesting usecase! Currently, the tool allows creation of datapoint along with reference transcripts. From what I understand, you wish to fix the subtitle start and end time while keeping the transcription for that segment same. If yes, we plan to add an enhancement where you can pass annotations aka segments with transcripts. This should solve your usecase.

    • TACIXAT 6 years ago

      That would be awesome. My hacky solution was a waveform and start and end sliders. It would just iterate through and you could accept, reject, or modify the times and text.

jcims 6 years ago

FWIW changing version to "3" in the compose file was necessary to get it to build with the latest release of docker-ce.

donpark 6 years ago

README.md file could use an image. I recommend the one from this page: https://github.com/midas-research/audino/blob/master/docs/tu...

  • manrajsinghOP 6 years ago

    Thank you for your suggestion! We're working hard to get a demo video out and intentionally left space for it. But sure, we can add a placeholder image till then.

sheeeep86 6 years ago

This could be cool for analyzing lectures

4ndrewl 6 years ago

Maybe it's me, but I was expecting this to be something to do with Arduino, given the name and the colour they've chosen for the logo.

classified 6 years ago

OMG, that list of frontend dependencies is just soul-crushing. How does anyone stay sane using NodeJS?

  • masonhensley 6 years ago

    Actually, wasn’t bad when I looked at it. I’ve seen much much x5 worse.

    A few font-awesome, testing-library, ES-Lint & react imports. Some of those broader libraries have been broken up so you don’t have to import the whole enchilada.

    But ya in a larger project, mixing and matching the versions of some of those components can get tricky. This repo seems reasonable in dependencies, the dependencies of dependencies on the other hand can be crazy in any project these days.

    • chrismorgan 6 years ago

      yarn.lock is just under half a megabyte, and lists 1461 packages that it installs. (232 of them are second or subsequent versions of the same package, which typically indicates unmaintained software. It has five versions of kind-of, and four versions of ten other packages.)

      • masonhensley 6 years ago

        Ya, that's not great - don't think the parent project of this post has gone off the rails though. More of an ecosystem problem.

        • manrajsinghOP 6 years ago

          I think you should refer to package.json for actual dependencies. But yes, I agree that tool dependencies are dependent on a lot of dependencies. I'll evaluate and reduce tool dependencies, if possible.

          That being said, the gzipped js bundle size is fairly small (under 200kb).

  • jononor 6 years ago

    NodeJS == backend

    • silviot 6 years ago

      I'm afraid you'll need to revisit this "fact".

      The project in question, for instance, only uses NodeJS to build its frontend. The backend is written in python.

      People who want to use react for their frontend _have_ to use NodeJS, for instance.

yamank 6 years ago

We want to hear more about the tool for speech annotation. Please try and let us know

  • jtbayly 6 years ago

    We, as in the developer? Or we as in a potential user?

    I’m a potential user, and I’d certainly like to hear more from somebody who has tried it.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection