Settings

Theme

Show HN: ReadToMe (iOS) turns paper books into audio

readtome-app.com

74 points by kolchinski 2 years ago · 48 comments · 3 min read

Reader

I'm launching something that started as a side project publicly today: ReadToMe, which is an iPhone app that turns paper books and other printed text into audio.

Originally this was a Christmas present for my fiancée, who loves books but has an eye problem that makes it hard for her to read more than a few pages at a time. She mostly listens to audiobooks while following along with the paper book, but some books aren't available in audiobook or even e-book form, and all of the existing apps we tried were surprisingly bad at scanning paper books into audio — they make lots of mistakes, include footnotes and page numbers, etc., in a way that really degrades the experience.

Being an AI-oriented engineer by training, I had a crack at solving the problem myself, and was pleasantly surprised at how well the proof of concept worked. I then had some time free while shutting down my previous company (Mezli, YC W21), during which I polished up the app to the point you see it at now.

The way it works:

On the front end, it's a SwiftUI app (mostly written by ChatGPT!) that consists mostly of a document scanner (VNDocumentCameraViewController) and a custom-built audio player.

The back end is more complex — book photos are first sent to an OCR API, then some custom code I wrote does a first pass at stitching together and correcting the results. Then, the corrected OCR results are sent to GPT-3.5-turbo for further post-processing and re-stitching together, and finally to a text-to-speech API for conversion to audio.

The hardest part of this process was actually getting the GPT calls right — I ended up writing a custom LLM eval framework for making sure the LLM wasn't making edits relative to the true text of the book.

A few issues remain, which I'll work on fixing if the app gets a significant amount of traction, including:

1) It can take multiple minutes to get audio back from a scan, especially if it's on the longer side (10+ pages). I'll be able to bring this down by spinning up dedicated servers for the OCR and TTS back-end.

2) The LLM sometimes does TOO good of a job at correcting "mistakes" in book text. This issue crops up particularly often when an author deliberately uses improper grammar, e.g. in dialogue.

The app is priced at $9.99/month for up to 250 pages/month right now, which I estimate will just about cover the costs of API calls. I'll be bringing the price point down as the pricing of the required AI APIs comes down. There's also a 3-day free trial if you want to try it out.

If you do find this useful, or know somebody who might, I'd appreciate you giving it a try or letting them know! And please let me know if you have any feedback, including issues or feature requests.

spacemanspiff01 2 years ago

It seems to me that there are 3 independent issues.

1 scanning the books to text.

2 reading text to the user.

3 having a good interface.

Number 1 seems to be where you put the most effort, along with 3.

I guess at least for me, there are often digital copies of books, either in epub or Kindle. When that's available those should be used.

And if it is not available, wouldn't it make more sense to have document scanner to epub?

I guess I'm just thinking that it is relatively rare that you really need to document scanning in order to get an audio book. Since most of the cost seems to be from document scanner side, it seems worthwhile to split them up.

And also seems like it would make sense to think of these as 2 separate products. Specialized document scanning, and audio generation. I can definitely see uses for one without the other.

  • kolchinskiOP 2 years ago

    Yes this is a very valid point, on a technical level it's definitely a 2-step thing. From a product perspective I'm framing it as a "this app reads books out loud to you" but if I start hearing about people using it to grab the text out (which is possible right now) I'll definitely consider paying more attention to that use case.

LeoNatan25 2 years ago

“Scan up to 250 printed pages per month for $9.99/mo”

I’m sorry, but LOL. Not even a full book.

That has to be one of the most terrible business models. I guess it’s in line with most app subscription models these days, only much worse. And if the excuse is “well it costs me too much on Azure and the phone native APIs are not good enough”, perhaps the answer is “don’t do it then”. No thanks.

  • ctrlcrshr 2 years ago

    I literally made an account for the first time in 7 years of lurking to respond to this ridiculous comment.

    You're not the demo. So your no thanks is expected.

    But, there are plenty of people with eye problems as OP laid out, and I can guarantee you that a large amount for them would be happy to pay $9.99 for the ability to actually consume and enjoy printed books that don't have pdf/eBook editions.

    But saying “don’t do it” is so off-base. It’s completely dismissing an entire sector of the market and humanity that isn’t like you.

    And perhaps that sort of comment, if given to a first time founder working on a tool that helped a swath of the population… they might see that and get discouraged.

    And then that tool that could legitimately improve people’s days, by allowing them to engage in parts of life they can’t easily access by themselves, wouldn’t exist.

    As someone who has spent my entire career in the realm of selling things, this is a solid business model that provides a great solution to a likely, very underserved, market.

    Plus the tech is cool and sounds fun to build. So, good work OP!

    • danmur 2 years ago

      They are saying it's too expensive. I think it's too expensive too, and although it's not my area I think the AI is unnecessary for that cost; not every solution needs to have the latest trend shoehorned into it. Presumably people with sight issues are like me in not wanting to pay extra for the technology used.

    • LeoNatan25 2 years ago

      > pay $9.99 for the ability to actually consume and enjoy printed books that don't have pdf/eBook editions

      You mean pay $9.99 for the ability to listen to half or less of a book, because there is a ridiculous 250 page per month limit.

      Yes, this is such a great solution for these people indeed.

broth 2 years ago

Love this but I have concerns with the price. You can usually find an audiobook corresponding to a paper book for relatively cheap. Services like Audible are a little more per month but you get more audio books. Given the 250 page per month limit at $9.99, how will this compete?

  • kolchinskiOP 2 years ago

    Yeah I (and my fiancée) are also Audible users — this app is mostly for cases where an audiobook or even e-book don't exist, like for older books you might get from a used bookstore or library. I'd prefer to set the price point lower, but if a user uses their 250-scan quota (which can be up to 500 pages, since you can scan 2 at a time) fully, I'll actually be losing about $10 on them that month, so I'm hoping not everyone uses their full quota!

    That said, I'm expecting OCR, LLM, and TTS API prices to continue coming down, at which point I'll be able to drop the price and raise the quota. Honestly I suspect iOS itself should be able to handle this use case well sooner or later, but until then, there's this app :).

moritz64 2 years ago

Is there something like this for epubs or pdfs with a truly high-quality TTS?

All apps that I know of use iOS internal TTS (sounds awful, not as good as Siri). Then is also Voice Dream Reader and even with the paid premium voices it is still not pleasant to listen to. Siri-grade TTS or Elevenlabs would be pleasant enough, though.

ummonk 2 years ago

Were the onboard text recognition and speech synthesis APIs not good enough for this task?

  • kolchinskiOP 2 years ago

    iOS has pretty decent built-in OCR and TTS but give it a try on a book page if you're curious — unfortunately the OCR makes a lot of mistakes (as well as including footnotes etc.) and the TTS is still pretty robotic. I do hope and expect they'll improve soon, though, at which point the only advantage of this app will be that it can scan multiple pages at a time — probably not enough of an advantage to justify its existence at that point although I'll see what users say. For now, as far as I've seen it's the highest-quality option for this (granted, very niche) paper-to-audio use case.

    • lemming 2 years ago

      That’s interesting, my experience with the iOS OCR has been pretty good, but I haven’t used it for anything like this. What are you using instead?

      • kolchinskiOP 2 years ago

        Also Azure actually! Yeah for a sentence or two the onboard OCR usually gets it right, but if you’re listening to a few pages at a time there are almost always a bunch of errors and it gets pretty exasperating to listen to.

ssttoo 2 years ago

Next step: turn the book into a 3D video.

I recently read an Isaac Asimov book where he was describing a device that takes a book and acts it out for you. Made me think we’re probably pretty close.

closetkantian 2 years ago

Could you make a video showing how it works? I don't have any iOS devices but would love to recommend to friends/family. Thanks.

carbone_12 2 years ago

OP - this is an incredible project! I worked on something similar (https://oration.app) and really love your idea of using CV/OCR. I'll certainly be giving your app a try

rickcarlino 2 years ago

I have been looking for a product like this for years, I hope you can bring the price down eventually. In the past I used one of those OCR pens that you can find on Amazon but I found that they were too slow to be of practical use.

Very excited to see all the cool things people publish once LLM pricing drops.

aryamaan 2 years ago

If you don’t mind me asking what do you use TTS?

  • closetkantian 2 years ago

    Yes, I want to know too

    • kolchinskiOP 2 years ago

      It's Azure's TTS API — I'm using four of their voices.

      • riscy 2 years ago

        Why not use Siri / the native TTS solution on iOS?

        • gnicholas 2 years ago

          The native TTS is not great. It doesn't sound like Siri — it's much robotic.

          • unfoldedCravat 2 years ago

            It can sound significantly better but there’s a couple hoops you have to jump through - and even then it’s decent, but not the same as Siri.

            You need the user to download ‘enhanced’ or ‘premium’ voices in the settings app. (Settings -> Accessibility-> Spoken Content -> Voices -> [Language of choice] -> [Voice of choice] -> Enhanced or Premium)

            In the app you have to search for the enhanced or premium voices when doing TTS.

            Heres an Objective C example, I’m sure there’s an easier way to write it in Swift. https://github.com/osmandapp/OsmAnd-iOS/pull/1156/commits/0b...

            I’m not sure if you’ll find this acceptable from a UX point of view but there’s an option to play with if you’d like.

            • gnicholas 2 years ago

              Yeah, I use a premium voice but was still disappointed when we added the feature to my reader app. I decided to leave it in the app since we'd already built it at that point, but it's kind of a bummer since obviously they could use Siri-level TTS if they wanted to.

      • RockRobotRock 2 years ago

        Did you give any thought to ElevenLabs?

        • kolchinskiOP 2 years ago

          Yes, their quality is great but the cost is astronomical — I pay about $8 in Azure TTS bills alone for TTS-ing a 500-page book (what you can scan per month with a $10 subscription), whereas Eleven Labs would be about $100 for the same length. I found Azure to be the best bang-for-the-buck, although I'm on the lookout for more affordable high-quality TTS, which would also let me drop the price point of the app.

          • diogomqbm 2 years ago

            did you try the openAI pricing? how does it look?

            • kolchinskiOP 2 years ago

              Just took a look, their lower-quality model is almost exactly the same price as Azure TTS, and the quality is similar. Thanks for the pointer.

blatherard 2 years ago

Sounds cool, have you looked into potential copyright issues?

  • ghufran_syed 2 years ago

    I looked into a startup working on a similar problem - as long as the digital text and audio are for personal use, I think it should be ok (or at least, not worth going after). If it's possible to share with other users or post the output online, then I think there would be a problem - though unless it was being shared in the app, it's the distribution part that would probably attract adverse legal attention, not the scanning and ocr which has been around for a long time.

  • rockemsockem 2 years ago

    There's lots of precedent for format conversions, especially text to audio, being fair use. Accessibility lawsuits have set a lot of the rules of the road.

Gys 2 years ago

Love it! But should be for all languages

  • kolchinskiOP 2 years ago

    All languages will be tough but adding common ones that are already supported by off-the-shelf OCR, TTS, and LLMs is definitely doable; do you have any that you're most keen on?

    • Gys 2 years ago

      I know someone who will love this for Dutch.

      You might also consider to add a translation option. For example to read a French text which will immediately be translated to English.

tamimio 2 years ago

> Turn any book into an audiobook

English book.

  • kolchinskiOP 2 years ago

    Yes, true (for now) — which language(s) are you looking for?

    The APIs I'm using are mostly multilingual so it'd be doable to extend to other languages.

    • tamimio 2 years ago

      That would be great! Nothing specific, but it would be nice to have feature if ever implemented.

      • kolchinskiOP 2 years ago

        Makes sense, one other interesting use case would be looping in translation into the middle of the pipeline so that you could scan a book in a language you don't speak and have it read to you in a language you do speak.

quickthrower2 2 years ago

Funny. Felt like another (eyeroll) AI thing, until I read your story here. So definitely use this story in your marketing too! Also the story gives the impression of attention to detail because of why you did it, which is good to know.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection