Show HN: ReadToMe (iOS) turns paper books into audio
readtome-app.comI'm launching something that started as a side project publicly today: ReadToMe, which is an iPhone app that turns paper books and other printed text into audio.
Originally this was a Christmas present for my fiancée, who loves books but has an eye problem that makes it hard for her to read more than a few pages at a time. She mostly listens to audiobooks while following along with the paper book, but some books aren't available in audiobook or even e-book form, and all of the existing apps we tried were surprisingly bad at scanning paper books into audio — they make lots of mistakes, include footnotes and page numbers, etc., in a way that really degrades the experience.
Being an AI-oriented engineer by training, I had a crack at solving the problem myself, and was pleasantly surprised at how well the proof of concept worked. I then had some time free while shutting down my previous company (Mezli, YC W21), during which I polished up the app to the point you see it at now.
The way it works:
On the front end, it's a SwiftUI app (mostly written by ChatGPT!) that consists mostly of a document scanner (VNDocumentCameraViewController) and a custom-built audio player.
The back end is more complex — book photos are first sent to an OCR API, then some custom code I wrote does a first pass at stitching together and correcting the results. Then, the corrected OCR results are sent to GPT-3.5-turbo for further post-processing and re-stitching together, and finally to a text-to-speech API for conversion to audio.
The hardest part of this process was actually getting the GPT calls right — I ended up writing a custom LLM eval framework for making sure the LLM wasn't making edits relative to the true text of the book.
A few issues remain, which I'll work on fixing if the app gets a significant amount of traction, including:
1) It can take multiple minutes to get audio back from a scan, especially if it's on the longer side (10+ pages). I'll be able to bring this down by spinning up dedicated servers for the OCR and TTS back-end.
2) The LLM sometimes does TOO good of a job at correcting "mistakes" in book text. This issue crops up particularly often when an author deliberately uses improper grammar, e.g. in dialogue.
The app is priced at $9.99/month for up to 250 pages/month right now, which I estimate will just about cover the costs of API calls. I'll be bringing the price point down as the pricing of the required AI APIs comes down. There's also a 3-day free trial if you want to try it out.
If you do find this useful, or know somebody who might, I'd appreciate you giving it a try or letting them know! And please let me know if you have any feedback, including issues or feature requests. It seems to me that there are 3 independent issues. 1 scanning the books to text. 2 reading text to the user. 3 having a good interface. Number 1 seems to be where you put the most effort, along with 3. I guess at least for me, there are often digital copies of books, either in epub or Kindle. When that's available those should be used. And if it is not available, wouldn't it make more sense to have document scanner to epub? I guess I'm just thinking that it is relatively rare that you really need to document scanning in order to get an audio book. Since most of the cost seems to be from document scanner side, it seems worthwhile to split them up. And also seems like it would make sense to think of these as 2 separate products. Specialized document scanning, and audio generation. I can definitely see uses for one without the other. Yes this is a very valid point, on a technical level it's definitely a 2-step thing. From a product perspective I'm framing it as a "this app reads books out loud to you" but if I start hearing about people using it to grab the text out (which is possible right now) I'll definitely consider paying more attention to that use case. “Scan up to 250 printed pages per month for $9.99/mo” I’m sorry, but LOL. Not even a full book. That has to be one of the most terrible business models. I guess it’s in line with most app subscription models these days, only much worse. And if the excuse is “well it costs me too much on Azure and the phone native APIs are not good enough”, perhaps the answer is “don’t do it then”. No thanks. I literally made an account for the first time in 7 years of lurking to respond to this ridiculous comment. You're not the demo. So your no thanks is expected. But, there are plenty of people with eye problems as OP laid out, and I can guarantee you that a large amount for them would be happy to pay $9.99 for the ability to actually consume and enjoy printed books that don't have pdf/eBook editions. But saying “don’t do it” is so off-base. It’s completely dismissing an entire sector of the market and humanity that isn’t like you. And perhaps that sort of comment, if given to a first time founder working on a tool that helped a swath of the population… they might see that and get discouraged. And then that tool that could legitimately improve people’s days, by allowing them to engage in parts of life they can’t easily access by themselves, wouldn’t exist. As someone who has spent my entire career in the realm of selling things, this is a solid business model that provides a great solution to a likely, very underserved, market. Plus the tech is cool and sounds fun to build. So, good work OP! They are saying it's too expensive. I think it's too expensive too, and although it's not my area I think the AI is unnecessary for that cost; not every solution needs to have the latest trend shoehorned into it. Presumably people with sight issues are like me in not wanting to pay extra for the technology used. > pay $9.99 for the ability to actually consume and enjoy printed books that don't have pdf/eBook editions You mean pay $9.99 for the ability to listen to half or less of a book, because there is a ridiculous 250 page per month limit. Yes, this is such a great solution for these people indeed. Love this but I have concerns with the price. You can usually find an audiobook corresponding to a paper book for relatively cheap. Services like Audible are a little more per month but you get more audio books. Given the 250 page per month limit at $9.99, how will this compete? Yeah I (and my fiancée) are also Audible users — this app is mostly for cases where an audiobook or even e-book don't exist, like for older books you might get from a used bookstore or library. I'd prefer to set the price point lower, but if a user uses their 250-scan quota (which can be up to 500 pages, since you can scan 2 at a time) fully, I'll actually be losing about $10 on them that month, so I'm hoping not everyone uses their full quota! That said, I'm expecting OCR, LLM, and TTS API prices to continue coming down, at which point I'll be able to drop the price and raise the quota. Honestly I suspect iOS itself should be able to handle this use case well sooner or later, but until then, there's this app :). Is there something like this for epubs or pdfs with a truly high-quality TTS? All apps that I know of use iOS internal TTS (sounds awful, not as good as Siri). Then is also Voice Dream Reader and even with the paid premium voices it is still not pleasant to listen to. Siri-grade TTS or Elevenlabs would be pleasant enough, though. Check out Speechify and NaturalReader, IIRC they’re two of the most popular apps for that use case and I remember their voices were pretty solid. Apple's own https://authors.apple.com/support/4519-digital-narration-aud... of course you can't generate audio for books purchased elsewhere because apple Narakeet can read EPUB and a bunch of other formats using realistic TTS - see https://www.narakeet.com/create/text-to-voice-audiobooks.htm... I worked on something like this if you'd like to give it a try!
https://oration.app I built this for myself, the main bottleneck is TTS prices are too high for an entire book, and open source ones aren't good enough yet. I'm almost finished building this, stay tuned. Were the onboard text recognition and speech synthesis APIs not good enough for this task? iOS has pretty decent built-in OCR and TTS but give it a try on a book page if you're curious — unfortunately the OCR makes a lot of mistakes (as well as including footnotes etc.) and the TTS is still pretty robotic. I do hope and expect they'll improve soon, though, at which point the only advantage of this app will be that it can scan multiple pages at a time — probably not enough of an advantage to justify its existence at that point although I'll see what users say. For now, as far as I've seen it's the highest-quality option for this (granted, very niche) paper-to-audio use case. That’s interesting, my experience with the iOS OCR has been pretty good, but I haven’t used it for anything like this. What are you using instead? Also Azure actually! Yeah for a sentence or two the onboard OCR usually gets it right, but if you’re listening to a few pages at a time there are almost always a bunch of errors and it gets pretty exasperating to listen to. Next step: turn the book into a 3D video. I recently read an Isaac Asimov book where he was describing a device that takes a book and acts it out for you. Made me think we’re probably pretty close. sounds like a Torment Nexus waiting for being created ;) Could you make a video showing how it works? I don't have any iOS devices but would love to recommend to friends/family. Thanks. Good idea thanks, let me figure out how to do that. OP - this is an incredible project!
I worked on something similar (https://oration.app) and really love your idea of using CV/OCR.
I'll certainly be giving your app a try I have been looking for a product like this for years, I hope you can bring the price down eventually. In the past I used one of those OCR pens that you can find on Amazon but I found that they were too slow to be of practical use. Very excited to see all the cool things people publish once LLM pricing drops. If you don’t mind me asking what do you use TTS? Yes, I want to know too It's Azure's TTS API — I'm using four of their voices. Why not use Siri / the native TTS solution on iOS? The native TTS is not great. It doesn't sound like Siri — it's much robotic. It can sound significantly better but there’s a couple hoops you have to jump through - and even then it’s decent, but not the same as Siri. You need the user to download ‘enhanced’ or ‘premium’ voices in the settings app.
(Settings -> Accessibility-> Spoken Content -> Voices -> [Language of choice] -> [Voice of choice] -> Enhanced or Premium) In the app you have to search for the enhanced or premium voices when doing TTS. Heres an Objective C example, I’m sure there’s an easier way to write it in Swift.
https://github.com/osmandapp/OsmAnd-iOS/pull/1156/commits/0b... I’m not sure if you’ll find this acceptable from a UX point of view but there’s an option to play with if you’d like. Yeah, I use a premium voice but was still disappointed when we added the feature to my reader app. I decided to leave it in the app since we'd already built it at that point, but it's kind of a bummer since obviously they could use Siri-level TTS if they wanted to. Did you give any thought to ElevenLabs? Yes, their quality is great but the cost is astronomical — I pay about $8 in Azure TTS bills alone for TTS-ing a 500-page book (what you can scan per month with a $10 subscription), whereas Eleven Labs would be about $100 for the same length. I found Azure to be the best bang-for-the-buck, although I'm on the lookout for more affordable high-quality TTS, which would also let me drop the price point of the app. did you try the openAI pricing? how does it look? Just took a look, their lower-quality model is almost exactly the same price as Azure TTS, and the quality is similar. Thanks for the pointer. Sounds cool, have you looked into potential copyright issues? I looked into a startup working on a similar problem - as long as the digital text and audio are for personal use, I think it should be ok (or at least, not worth going after). If it's possible to share with other users or post the output online, then I think there would be a problem - though unless it was being shared in the app, it's the distribution part that would probably attract adverse legal attention, not the scanning and ocr which has been around for a long time. There's lots of precedent for format conversions, especially text to audio, being fair use. Accessibility lawsuits have set a lot of the rules of the road. Love it! But should be for all languages All languages will be tough but adding common ones that are already supported by off-the-shelf OCR, TTS, and LLMs is definitely doable; do you have any that you're most keen on? I know someone who will love this for Dutch. You might also consider to add a translation option. For example to read a French text which will immediately be translated to English. > Turn any book into an audiobook English book. Yes, true (for now) — which language(s) are you looking for? The APIs I'm using are mostly multilingual so it'd be doable to extend to other languages. That would be great! Nothing specific, but it would be nice to have feature if ever implemented. Makes sense, one other interesting use case would be looping in translation into the middle of the pipeline so that you could scan a book in a language you don't speak and have it read to you in a language you do speak. Funny. Felt like another (eyeroll) AI thing, until I read your story here. So definitely use this story in your marketing too! Also the story gives the impression of attention to detail because of why you did it, which is good to know.