Show HN: Estimated Reading Time API
klopets.comTo those folks saying, "this is easy to code up in JavaScript" or "Pelican already does this for you" -- does it do it for you for other people's content?
I like this because I can just plug in the URL of any old article I might want to read and see what I'm getting myself into, e.g. http://klopets.com/readtime/?url=http://www.newyorker.com/ma.... Now that I know it's a 15 minute read, I'll probably save it for later.
This would be great as a browser plugin.
I like your username...
You are right, and thanks for pointing that out. I can generally tell how long it will take to read an article with a little scroll and a glance at the scroll bar. If the scroll bar is small, reading is going to take a while and vice versa.
Unless you have sites full of Facebook comments and bullshit taboola content at the bottom
Great to hear that you have this problem, because in the evenings, I am working on a little extension of Pocket that automatically groups saved links by the length of content :) so it's not only me who has the problem, heh.
I have a lot of quality content marked for later and not knowing how long it is (especially when e.g. commuting to work when I have 10 minutes only) is rather annoying. And managing the list isn't something I fancy doing myself (we have computers for that!).
Drop me an email (in my profile) if you want to talk a bit more about this problem.
Is this specially good? Does it use a better algorithm or something?
Just asking because there's no explanation, and it would be probably better to hack something in JS than to depend on this probably-soon-to-vanish-api. There's already https://eager.io/app/reading-time, for example, which anyone can install in 2 minutes, and is based on a simple algorithm[1], it seems.
[1]: https://github.com/TeffenEllis/reading-time/blob/master/app....
Why do we need an API for this? A library seems easier, and even that is a stretch for doing `$wordcount / 200 = minutes required to read`. Does this make an estimate of the article's complexity and adjust how many words a person reads per minute?
(Source for 200 words per minute: https://en.wikipedia.org/wiki/Words_per_minute#Reading_and_c... )
We definitely don't need an API. Then again, we basically don't need most APIs. This doesn't estimate the article's complexity as of now, but its main point is not just getting the length of the entire site, but locating the most likely main content area and THEN doing the /250.
You can use something like:
https://github.com/grangier/python-goose
https://pypi.python.org/pypi/textstat/
+ using word counts that adjust for reading ease
Remember that URLs don't always point to websites:
Aaaaand now there are tens of IPs trying to access /etc/passwd. Tailing my "failed hack attempts" log is kinda fun now.
But if you wrote this to warn me, then thanks!
Also be careful about redirect handling: http://evil.com might redirect you to file:///etc/passwd
> But if you wrote this to warn me, then thanks!
I did.
You're not the first person to make that kind of mistake, and I assumed it was an obvious enough "attack" that trying to communicate it privately wasn't required.
Though I now have an extra if statement in my code to detect and log this type of 'hacking' attempts in addition to some others, the code was never vulnerable to this in the first place. No file contents are displayed at any time anyway.
There is a plugin[1] for people that blog with Pelican for this. It also will score your Flesch-kincaid[2] values. You can see it in action on my blog: http://caffeineindustries.com/
I do think I should adjust the words per minute values...
[1] https://github.com/getpelican/pelican-plugins/tree/master/po... [2] http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readabil...
Curious as to why the reading time is estimated as 2 minutes and 96 seconds. Wondering if it isn't supposed to be 3 minutes and 36 seconds?
I plugged one of my own posts in and it read 3 minutes and 166 seconds (5 minutes and 46 seconds).
It's not showing x minutes AND y seconds, it's x minutes OR y seconds. Minutes is just there for someone who doesn't need precision and can let the API handle the rounding. Seconds is for people who want to do more advanced stuff.
I like the idea however it is confused by messy HTML such as: http://klopets.com/readtime/?url=http://www.nytimes.com/2016...
I'm aware of it, thanks. The current approach isn't really that flexible. For example, I've seen the NY Times and The Atlantic not working. I've considered some different potential fixes but haven't implemented them yet. Thanks!
Nice demo! Is this code for this available online? I'd love to see how this works.
This is just that - a demo right now. I'll most likely keep this as a pet project of mine, trying to make the underlying algorithm a bit more intelligent. I'll then probably release it on GitHub (https://github.com/mklopets).
FYI: Made a similar npm module a while ago if you want this functionality locally: https://github.com/hswolff/read-time
Thanks for sharing! The point here isn't just calculating the reading time based on a WPM metric, it's fetching a remote page, analyzing it to find the main content and then doing the maths, among other stuff taking into account any images.
A bit too simple for an API, reminds me of Fuck Off As A Service. http://www.foaas.com/
Simplicity is a good thing. A service that attempts to do one thing well.
Not when the HTTP request itself is more complicated to do than actually reimplementing the thing in Javascript.
NOTE: If you are using a library for easily doing HTTP requests, than you can probably use a library for estimating time to read.
I like the notion that a service can continue to improve under the covers without me needing to do software updates to get a new version of library.
I know you can just include an external JavaScript library but I only do that for sources I trust.
The service can also be discontinued. And that is what mostly often happens. Big improvements like what you're imagining are rare.
We've officially reached Peak API. What's the algo?