Settings

Theme

Future of DeepSpeech / STT after recent changes at Mozilla

discourse.mozilla.org

147 points by trowngon 6 years ago · 74 comments

Reader

jarym 6 years ago

Maybe we should try to find a list of exactly what they are focussing on going forward instead of the slow drip of things they’re cutting back on (servo, MDN, DeepSpeech...)

It’s a sad sad day when you have an organisation getting hundreds of millions in funding and turning away from what’s its good at. The decline has begun in my eyes, it may not become apparent for a few years yet.

  • mrweasel 6 years ago

    Cutting out DeepSpeech seems sensible to me, it’s out of place in the general portfolio of products.

    It would be nice if Mozilla could tell us what their focus is going to be, but I doubt that Mozilla management know at this point.

    At this point I’m somewhat concerned that Firefox will be irrelevant in fives years, and I don’t currently feel that Mozilla is communicating clearly that they still care about Firefox. I assume they must, but it would be comforting to know that Firefox is still at the core of Mozillas strategy.

    • Cybiote 6 years ago

      > Cutting out DeepSpeech seems sensible to me, it’s out of place in the general portfolio of products.

      I disagree precisely because of the point you make later: "I’m somewhat concerned that Firefox will be irrelevant in fives years".

      Functionality provided by deep learning is going to be an important component of many types of software interactions going forward. The logistics of this will be quite different from what we are used to in open source, with the need to fund and coordinate compute, collect and handle data being a more vital aspect compared to the past.

      There are STT software, some mentioned in this thread, that match or are even better than DeepSpeech but none of them are as ergonomic. Accounting for the value of time, this means it will be more cost effective to outsource such capabilities to the cloud. Which comes with trade-offs that are difficult to appreciate in the short term: https://news.ycombinator.com/item?id=24236489

      I'd say DeepSpeech fits in the mold of Mozilla as a company providing solutions to complicated software problems that are better at respecting the user and their privacy.

      In the old days, the most accurate TTS and STT models were built into the OS. These days, you need to call into the cloud to get the best stuff. In [1], Internet Archive complains about the quality of their OCR software. It's not that OCR is so bad, it's that the best OCR is found on Google's and Microsoft's computers. It's possible to cobble something together using open source solutions like EasyOCR, Tesseract+OpenCV but that will only get you part of the way there. What makes the cloud offerings so good is they have enough resources to devote to pre-processing pipelines and architecture tweaks and settings better able to handle edge cases. Most of the mass resides in edge cases.

      From my vantage, the future looks to be one of software as thin layers built atop APIs which call into programs running on the servers of a handful of companies. You might not think this a big deal but these software will be the ones scanning the environment, writing the emails, completing the thoughts and planning the calendars for the majority of humans.

      [1] https://blog.archive.org/2020/08/21/can-you-help-us-make-the...

      • posguy 6 years ago

        Based on the testing I just did with Vosk, Mozilla DeepSpeech, Google Speech to Text and Microsoft Azure, I disagree with your arugment that SaaS has the best quality results.

        Mozilla DeepSpeech was definitely trailing the bleeding edge, but Vosk using the vosk-model-en-us-daanzu-20200328 model produces very accurate results even on uncommon words, similar in performance to Google & Microsoft (which has generally better formatting than Google's STT)

        Try it yourself:

        Google: https://cloud.google.com/speech-to-text/ See "Put Speech-to-Text into action" header

        Microsoft: https://azure.microsoft.com/en-us/services/cognitive-service... See "Upload File"

        Vosk: https://alphacephei.com/vosk/

        Had Mozilla provided 4x to 8x more GPU resources and more staff, then their STT would likely be competitive. Other small STT developers can iterate and test much faster due to having more hardware at their disposal.

      • NegatioN 6 years ago

        Even Google is trying to offload as much of these computations to on-device chips as possible nowadays though.

        Their new Pixel has voice control entirely backed by on-device models for example.

        I think SaaS is a stopgap for good ML, and that eventually enough of this will be open source, that basic tasks such as vision and speech will be cheap to solve for any company with high tech competency.

  • ldayley 6 years ago

    Is now a good time for someone to write the "Unbundling Mozilla" start-up post on substack? I'd love to see something cogent written up about it. Something like this[0]?

    [0] - https://latecheckout.substack.com/p/the-guide-to-unbundling-...

    EDIT: Add link

echelon 6 years ago

I'm not sure about Mozilla's efforts in STT, but they were lagging pretty far in TTS. [1]

Google/Baidu, universities, and an assortment of Chinese/Japanese/Korean social media companies (Line, etc.) are posting the most compelling TTS research, models, and code. Mozilla's TTS system [2] is an amalgam of some of these models, but it lags pretty far behind state of the art.

Mozilla should focus on getting additional revenue streams. We can help them out by trying to get Congress / DOJ to strip Google of its ability to have and maintain a browser with which they entrench their search and advertising moat. I think they're clearly in antitrust/anticompetitive territory.

[1] I'm pretty familiar with this field as I wrote https://vo.codes and https://trumped.com TTS systems. Neither of those are state of the art in terms of mean opinion score (MOS), but they're incredibly efficient.

[2] https://github.com/mozilla/TTS

  • nshm 6 years ago

    It is explainable given that there was a single developer working on TTS. It is hard to compete with big academic teams/industry players this way.

    I also believe Mozilla team was restricted by a lack of computing resources. They had just a single 8GPU server or so.

    • posguy 6 years ago

      Said 8 GPU server was consistently in use for Mozilla DeepSpeech (now renamed Mozilla STT) in training models. Its impressive how far Mozilla got considering how limited their resources were.

    • qchris 6 years ago

      This is an area that I find unbelievably frustrating. A lack of computing resources in the current day is kind of insane. You can buy an 8GB GPU for <$1000. Even with the rest of the costs, the cost of hardware like this is a drop in the bucket when your main office is housed in Mountain View! Especially on a project that ends up being public-facing, these are missed opportunities where a little can go a long way.

      • nmstoker 6 years ago

        I take your point but according to the release details on the repo it was not 8Gb on one card but a server with 8 cards, each a Quadro RTX 6000 with 24Gb, and they're around £4k each currently, so the cost of the GPUs alone is £32k

        https://github.com/mozilla/STT/releases/tag/v0.8.2

        • qchris 6 years ago

          Ah, I see-- not an 8GB, 1-GPU server, but an 8-GPU server. That does make a bit of a difference, changing the cost from a new workstation to functionally a piece of capital equipment. Still, I'm not sure that my point about equipment costs falls short--even at (call it) $40K, you're probably talking less than 3 months of the company's all-in cost for the developer themself, amortized over multiple years.

      • panpanna 6 years ago

        We need a SETI@home approach to open source AI models.

        Only then we can break our dependency on Google and Facebook - and Mozilla for that matter.

  • chromedev 6 years ago

    Chromium is open source and you can apply policies to do the things you mention. Based on your logic Mozilla should also be forced to get rid of Firefox Sync.

    • echelon 6 years ago

      Chrome is shoved down grandma's throat. She probably doesn't know much other than it's the "Google Internet thing". It's the default on Android and Google.com nags you to install it.

      This is worrying given that Google cripples the browser and web standards to favor its own search engine and advertising platform.

      Killed the semantic web and semantic markup? Check.

      Disabled APIs for blocking ads? Check.

      Use Google.com as the default search? Yep.

      Embrace and extend the web with AMP and instant apps? Bingo.

      Auto log into your Google session or nag until users permit it? Absolutely.

      Trying to destroy the notion of a URL? I thought those were cool.

      Google is destroying the web and is about as anti-competitive as they come.

      • chromedev 6 years ago

        > Killed the semantic web and semantic markup? Check.

        Based on what evidence?

        > Disabled APIs for blocking ads? Check

        They didn't. uBlock Origin and adblocker extensions never stopped working.

        > Use Google.com as the default search? Yep.

        What do you think Edge does here? Easily changed via policies.

        > Auto log into your Google session or nag until users permit it? Absolutely

        Doesn't nag you and easily disabled in settings or via policy.

        > Trying to destroy the notion of a URL? I thought those were cool.

        I only get a little frustrated on Android, but just have to remember to hit the edit icon if I want to change it.

        • sjagoe 6 years ago

          > > Disabled APIs for blocking ads? Check

          > They didn't. uBlock Origin and adblocker extensions never stopped working.

          That was probably this issue in the chromium tracker https://bugs.chromium.org/p/chromium/issues/detail?id=896897...

          I don't know what happened after that though; the conclusion of that issue (in Jan 2019) was "these changes are draft, and still being discussed".

taf2 6 years ago

I see a lot of what appears to be over reaction... doesn’t sound like deepspeech is ending in the first part of the announcement

“ Most of the technical changes were already landed, and we see no reason not to ship it. We’ll be releasing 1.0 soon and encourage everyone to update their applications”

So looks like at least 1.0 is near and still gonna happen... I know these seem like dark times for Mozilla but I believe they will survive. As I recall the decline of Netscape was a pretty dark time and out of that came Phoenix - er Firefox and here we are today... I’m sure Mozilla and many of the great projects will survive

no_wizard 6 years ago

I don’t know what is going to save Mozilla, really I don’t. I just wish there was a way to “reach” them and discuss how we the internet community could come to an agreement about what they could do to derive value we would pay for.

It’s not for a lack of trying on their part for sure, but it feels like just using their browser isn’t all there is to it any more

  • narag 6 years ago

    what they could do to derive value we would pay for

    For someone that found Linux in the 90's and watched the birth of Mozilla from the ashes of Netscape, that's a very strange thing to read.

    This site is not Slashdot, I know. It always had another kind of relation to business and money. But still...

    I have no idea why Mozilla should need a business model. Much less I understand why should we think of one and agree on it.

    How much money does it take to maintain a web browser? If it's a lot, maybe, just maybe, we should agree on a reduced feature set and refuse to use something more complex. Some people here talk about text mode browsers. I'm not so radical. Just keep it simple enough to be maintanable by a dozen of volunteers.

    • bn-usd-mistake 6 years ago

      Why? Should we apply the same logic to Linux? Why should we arbitrarily restrict user value because something costs money?

      Isn’t the main problem that users are not willing to pay for the browser they use?

      Google Chrome is probably maintained by much more than 12 people, so if we restrict Firefox to that, everyone is just going to move to Chrome anyways.

    • strictnein 6 years ago

      > I have no idea why Mozilla should need a business model.

      Because developers aren't free and "let's get money from Google searches" is great until Google decides not to fund a competitor any more.

  • juststeve 6 years ago

    Building B2B services around rust ie. onsite training, consulting, development to me seems better than firing people - what am I missing here?

    • ralph84 6 years ago

      Almost all company-sponsored programming languages are run as loss leaders to enable selling some other profitable product of the company. What is the profitable product that Rust enables?

      • juststeve 6 years ago

        Well an IDE would’ve been one option, as well as backend services for enterprise who are migrating to Rust. Otherwise as I mentioned the product is services like outsourced development, consultancy and training resources?

      • yjftsjthsd-h 6 years ago

        > What is the profitable product that Rust enables?

        Surely that's Firefox?

        • treis 6 years ago

          Nobody is building anything based on Firefox. It's not like Rails or .NET that gives your application a head start.

          • InfiniteRand 6 years ago

            People used to build a lot of software around Gecko, there are still some notable users like Komodo IDE, but Firefox is a lot harder to embed than it once was. Servo from the Rust team was supposed to solve this by providing a new embeddable browser core, not sure if that is still the long term plan

        • jacquesm 6 years ago

          Firefox apparently is not longer a focus because it is hard to monetize outside of the search box, see earlier letter. I would definitely not take Firefox' future for granted at this point.

          • liability 6 years ago

            Firefox is the only thing Mozilla has ever been able to make any money with; anything else has gotten them a pittance at best.

            Giving up on that because it's 'too hard', without first proving they have an alternative? That would be insanely foolish. They may as well close up shop now if that's their plan.

  • jrochkind1 6 years ago

    Has "the internet community" ever "come to an agreement" on literally anything?

  • floatingatoll 6 years ago

    What would you personally pay Mozilla for?

    • echelon 6 years ago

      Firefox, Rust, and privacy.

      It'd be really awesome if they could develop a search engine or phone (I know they tried) that had an open standards / web-compatible development kit.

      I want an anti-Google / anti-Apple. Something we own and can extend. Something that doesn't sell our data.

      I'd also like to see Mozilla doing lobbying. Partnering with the EFF. We've strayed so far from the bright and open Internet of the 90's and 00's. It's depressing to think about how locked up and proprietary it's all become.

      I'll buy Mozilla / Firefox merch. I'll pay a subscription.

      edit: Talk to Shuttleworth. Fold Ubuntu in. I'll buy a Mozilla phone and a Mozilla laptop.

      • feanaro 6 years ago

        I feel bad for doing a "me too" comment, but you've nailed exactly my thoughts on the subject. I feel like Mozilla hasn't really tried something like this. Every time it gets suggested, it quickly gets shot down (by other internet commenters) as "can't be done" and "wouldn't generate nearly enough money".

        Well... maybe not with that CEO salary.

        • echelon 6 years ago

          Mozilla can model itself after Microsoft somewhat.

          Provide a development stack (they're experts at Web and Rust). Make themselves the go-to shop for developers in that realm.

          Sell them on an OS and editor with support. Partner with Ubuntu. Hell, I would even reach out to Nadella and see if they'd be willing to work with Mozilla on hedging against Google. Mac is becoming locked down and kind of unpleasant to develop on/for. Mozilla could win this.

          Block all the advertising and tracking. Build a Spotify-like news aggregation service you can access from your Mozilla subscription.

          Build an email service like Hey and a file backup service like Dropbox. It's too bad Zoom bought Keybase, but perhaps Chris Coyne wants a new gig?

          We should team up to beat FAAMG. Most of the FAAMG actors are actually quite damaging to open source despite benefiting from it greatly.

          • no_wizard 6 years ago

            This all sounds to me like capital intensive businesses against entrenched players where even the not so average consumer would likely not do more than pay lip service to it unless there was some secret sauce to this that was more compelling to the options

            They neeed a good out of the park product in those markets to make any real headway. Too idealistic.

            My only thought on this is that they should pivot to be like algolia , focus on Firefox being a reference implementation browser and seek their expertise to the other vendors, maybe. It’s one of the few verticals I can think of that would work strategically Without them having to pivot into things they have no experience with

            • qchris 6 years ago

              Do they? I mean, most of these vendors are already competing, and unlike Firefox, they're not necessarily competing for the average Joe, but technical users who often have different priorities.

              Those are also services that groups are used to paying for already, which means if they could eat the start-up costs, even at a reduced scale, they could make a profit at even a slight premium for things that they already do very well, and go from there.

    • qchris 6 years ago

      I'm already personally paying Mozilla $8/mo for their VPN and private browser extension.

      If they offered something like the services offered by mailbox.org, or Librem One? I'd switch my GMail account tomorrow, including the storage fees I'm paying on it, and would do it at triple the cost for not abusing my data. Hell, they already have the domain experience with their proximity to Thunderbird devs.

    • tchaffee 6 years ago

      MDN, Firefox (voice search would be nice), and anything that is a replacement for Google products.

eruleman 6 years ago

Does anyone know of other open-source projects in the speech-to-text space? DeepSpeech was one of the most promising projects, especially the latest versions...

  • nshm 6 years ago

    Try https://github.com/alphacep/vosk-api. It supports 10 languages, works on Android and RPi and also has big and more accurate server models.

    Other good ones are https://github.com/daanzu/kaldi-active-grammar and https://talonvoice.com/

    There are toolkits for research like https://github.com/kaldi-asr/kaldi, https://github.com/espnet/espnet, wav2letter, Espresso, Nvidia/Nemo, https://github.com/didi/athena. You can try them too if you want to go deep. Some of them have interesting capabilities.

    • posguy 6 years ago

      Comparing DeepSpeech v0.7.4 to Vosk using plain spoken English samples from male and female speakers, they seem to be performing the same if I use vosk-model-small-en-us-0.3 and the full size DeepSpeech model.

      When I use vosk-model-en-us-daanzu-20200328 the result is perfect on many of these tests, though it does not do punctuation or capitalization outside apostrophes. IIRC there is another project on Github that can add basic formatting though.

      I am quite surprised with vosk's performance, it even handles odd words like Puget Sound well! Need to test our more accented audio on it, but this is quite exciting.

  • albertzeyer 6 years ago

    There are a lot of open source projects in this space. DeepSpeech is actually one of the outsiders (they are not represented well in the academic community), and also not quite competitive to other software (at least last time I checked).

    E.g. some very active projects are:

    * Kaldi (https://github.com/kaldi-asr/kaldi/) obviously, probably the most famous one, and most mature one. For standard hybrid NN-HMM models and also all their more recent lattice-free MMI (LF-MMI) models / training procedure. This is also heavily used in industry (not just research).

    * ESPnet (https://github.com/espnet/espnet), for all kind of end-to-end models, like CTC, attention-based encoder-decoder (including Transformer), and transducer models.

    * Espresso (https://github.com/freewym/espresso).

    * Google Lingvo (https://github.com/tensorflow/lingvo). This is the open source release of Googles internal ASR system, and used by Google in production (their internal version of it, which is not too much different).

    * NVIDIA OpenSeq2Seq (https://github.com/NVIDIA/OpenSeq2Seq).

    * Facebook Fairseq (https://github.com/pytorch/fairseq). Attention-based encoder-decoder models mostly.

    * Facebook wav2letter (https://github.com/facebookresearch/wav2letter). ASG model/training.

    * (RETURNN (https://github.com/rwth-i6/returnn) and RASR (https://github.com/rwth-i6/rasr), our own, although this is currently free for academic use only. It is used in production as well. Supports hybrid NN-HMM, CTC, end-to-end attention-based encoder-decoder, transducer, etc.)

    And there are much more.

    You will also find lots of ready-to-use trained models.

    • convery 6 years ago

      You seem to know a lot about the topic, any idea about the current state of text-to-speech? Haven't seen any opensource projects that would make, for example, an ebook enjoyable.

      • nshm 6 years ago

        Recent more or less reasonable one is https://github.com/TensorSpeech/TensorFlowTTS, it implements all the latest algorithms. For simple business books it will be ok, for emotional fiction probably not there yet.

        • liability 6 years ago

          Extant TTS is already there for fiction, if you approach it with the right expectations (more an alternative to visual reading than dramatically read audio books.) I've 'read' numerous fiction books using MacOS's TTS ('Alex') and with my kindle (3rd gen 'keyboard' model from 2010.)

          These extant solutions require an effort-investment from the user to work up to fast speeds, but once the user becomes acclimatized they work great. The neuroplasticity of the human brain seems to do a great job of smoothing out the wrinkles.

          • JZL003 6 years ago

            I agree - I've been using google's TTS api for audiobooks and it's great. I switch off between professional audio books (overdrive is amazing and free by public libraries) and TTS and, while professionals can add something, you get used to TTS pretty fast. Google's TTS gives 1 million free characters a month, which is pretty generous for a single person and it sounds pretty good. I read books with pretty weird character names (like the Wandering Inn web serial) and it never explodes. Sometimes it spells out character names but even for very non-standard names, it does fine.

            I've experimented with some of tacotron TTS/espnet to do the TTS on my computer and they work alright. Sometimes you get weird edge cases and it makes some pretty weird sounds (and even if your laptop doesn't have a GPU, google co-lab works well for quick audiobook generation). I don't hit the million characters that often so it hasn't been a big deal but I'll probably move to home-made just because I like tweaking it.

            The way I think about it is that the written word doesn't have much intonation anyway so as long as the audiobook doesn't offend me, it's a pretty good solution (and helps prevent eye strain after working on a computer all day)

    • Bootwizard 6 years ago

      Can you run audio files through any of these or do they only support audio from microphones?

      • nmstoker 6 years ago

        At the point of them taking in input to process, audio that comes from a microphone or comes from a file is basically just a series of numbers and is the same. So there's no barrier in terms of feasibility.

        Whether they're all set up to do that "off the shelf" is a different matter but it should be fairly straightforward to add this to any that lack it and because they're open-source anyone could do a bit of Googling etc and find suitable code to adapt to do it. I know DeepSpeech definitely can take audio from files directly as input as I've used it that way before, and I strongly expect many (or possibly all) of the others could too.

      • posguy 6 years ago

        DeepSpeech and Vosk can accept audio files, although each wants them formatted in a slightly different mono WAV format.

        See my other comment for a comparison of the two: https://news.ycombinator.com/item?id=24248238

  • kouohhashi 6 years ago

    deepspeech.pytorch is a good one. Since Mozilla's DeepSpeech project is still using tensorflow 1.x, I think pytorch implementation is actually better. https://github.com/SeanNaren/deepspeech.pytorch

ianlevesque 6 years ago

Between this and Servo I guess Mozilla is just giving up on relevance. That really sucks.

  • marcinzm 6 years ago

    That’s what got them in this mess in the first place, fifty pie in the sky projects to be relevant instead of focusing on Firefox or just saving revenue aggressively.

utunga 6 years ago

I work with Mozilla's DeepSpeech every day. Mozilla's STT is critical to the survival of important indigenous languages throughout the world.

I sincerely hope we can help make this project continue and that Mozilla can help us do that.

Ensuring indigenous languages have digital representation is essential to their survival. Speech recognition and synthesis are a vital part of that. Indigenous communities are often ignored by Big Tech because they bring little financial value to their bottom lines, but financial bottom lines are not everything. Culture is more important. Open source tools like DeepSpeech allow communities to build the tools they need for themselves.

Māori have been working to help build tools for te reo Māori, and our project is at the forefront of using open source tools like DeepSpeech to revitalize the Māori language. The core of a good speech recognition system helps us in many practical ways, such as improved transcription, support for pronunciation, correct announcements in public transport, correct information on maps and in many other ways. We may well continue to support and use DeepSpeech if the project can continue.

But there are also many other projects in other countries in the world who may follow on - such as the Kabyle people of Algeria who are using DeepSpeech, or the Mohawk nation in North America who have been looking into it.

By the way we are working on our web presence but for now this quick one pager gives some idea of the work we are doing - https://papareo.nz.

  • nshm 6 years ago

    Is your current data public? Do you have sufficient amount of untranscribed data (1000+ hours)? We could help you.

bananaface 6 years ago

Are they still collecting data for Common Voice, or is it both projects that they're terminating?

awalton 6 years ago

Ugh this is a deep gut punch, as this is one of the most interesting recent projects Mozilla was working on in my opinion.

_underfl0w_ 6 years ago

Yikes. All this because they refuse to trim the fat at the C-level. A company can't be profitable by only employing overhead. They'll all be forced to take the ultimate pay cut when Mozilla closes up shop.

milofeynman 6 years ago

I was hoping DeepSpeech would lead to in home cloud-less "Alexas". Just ask me for a subscription on it and productize it please.

markthethomas 6 years ago

DS is by far the easiest to use/most promising ASR library/kit/thing I’ve used; really hoping it keeps going

_8j50 6 years ago

Are there other foundations like mozilla we can donate to? For initiatives that are in the interest of the public? The Apache foundation is all I can think of but they focus on corporate use projects.

Causality1 6 years ago

Really a shame. I find so many of Google's STT quirks infuriating enough I'd love a robust alternative.

neolog 6 years ago

Where does it say DeepSpeech is on hold? I don't see that anywhere.

  • dang 6 years ago

    Submitted title was "Mozilla to put DeepSpeech project on hold". We've replaced that with the article title per this guideline: "Please use the original title, unless it is misleading or linkbait; don't editorialize." https://news.ycombinator.com/newsguidelines.html

    • Vinnl 6 years ago

      I guess that explains why I was under the impression the project was being shuttered after reading the comments, but not the actual post yet.

      The actual forum post just says they don't know anything about the future of DeepSpeech yet, for those doing the same.

  • Ensorceled 6 years ago

    > Until a proper decision is being made regarding the future of the project, we will “keep the lights on” and try to address existing issues and review your contributions to the best accommodation we can in the scope of our new roles.

    You could say that "keep the lights on" is the same as on hold.

jonny383 6 years ago

Mozilla let politics take over its corporation to the point where it's basically a far left extremist group now that's relying on semi-bribe funding from Google.

There's no technology left anymore.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection