Show HN: Estimated Reading Time API

56 points by mklopets 10 years ago · 29 comments

Reader

To those folks saying, "this is easy to code up in JavaScript" or "Pelican already does this for you" -- does it do it for you for other people's content?

I like this because I can just plug in the URL of any old article I might want to read and see what I'm getting myself into, e.g. http://klopets.com/readtime/?url=http://www.newyorker.com/ma.... Now that I know it's a 15 minute read, I'll probably save it for later.

This would be great as a browser plugin.

monkmartinez 10 years ago

I like your username...
You are right, and thanks for pointing that out. I can generally tell how long it will take to read an article with a little scroll and a glance at the scroll bar. If the scroll bar is small, reading is going to take a while and vice versa.
- camillomiller 10 years ago
  
  Unless you have sites full of Facebook comments and bullshit taboola content at the bottom
gedrap 10 years ago

Great to hear that you have this problem, because in the evenings, I am working on a little extension of Pocket that automatically groups saved links by the length of content :) so it's not only me who has the problem, heh.
I have a lot of quality content marked for later and not knowing how long it is (especially when e.g. commuting to work when I have 10 minutes only) is rather annoying. And managing the list isn't something I fancy doing myself (we have computers for that!).
Drop me an email (in my profile) if you want to talk a bit more about this problem.
ca1668afa68 10 years ago

https://github.com/gabconcepcion/chrome-extension-read-time-...

fiatjaf 10 years ago

Is this specially good? Does it use a better algorithm or something?

Just asking because there's no explanation, and it would be probably better to hack something in JS than to depend on this probably-soon-to-vanish-api. There's already https://eager.io/app/reading-time, for example, which anyone can install in 2 minutes, and is based on a simple algorithm[1], it seems.

[1]: https://github.com/TeffenEllis/reading-time/blob/master/app....

lucb1e 10 years ago

Why do we need an API for this? A library seems easier, and even that is a stretch for doing `$wordcount / 200 = minutes required to read`. Does this make an estimate of the article's complexity and adjust how many words a person reads per minute?

(Source for 200 words per minute: https://en.wikipedia.org/wiki/Words_per_minute#Reading_and_c... )

mklopetsOP 10 years ago

We definitely don't need an API. Then again, we basically don't need most APIs. This doesn't estimate the article's complexity as of now, but its main point is not just getting the length of the entire site, but locating the most likely main content area and THEN doing the /250.
- fweespeech 10 years ago
  
  You can use something like:
  https://github.com/grangier/python-goose
  https://pypi.python.org/pypi/textstat/
  + using word counts that adjust for reading ease

stevekemp 10 years ago

Remember that URLs don't always point to websites:

http://klopets.com/readtime/?url=file:///etc/passwd

http://klopets.com/readtime/?url=file:///etc/shadow

mklopetsOP 10 years ago

Aaaaand now there are tens of IPs trying to access /etc/passwd. Tailing my "failed hack attempts" log is kinda fun now.
But if you wrote this to warn me, then thanks!
- alexbecker 10 years ago
  
  Also be careful about redirect handling: http://evil.com might redirect you to file:///etc/passwd
- stevekemp 10 years ago
  
  > But if you wrote this to warn me, then thanks!
  I did.
  You're not the first person to make that kind of mistake, and I assumed it was an obvious enough "attack" that trying to communicate it privately wasn't required.
  - mklopetsOP 10 years ago
    
    Though I now have an extra if statement in my code to detect and log this type of 'hacking' attempts in addition to some others, the code was never vulnerable to this in the first place. No file contents are displayed at any time anyway.

monkmartinez 10 years ago

There is a plugin[1] for people that blog with Pelican for this. It also will score your Flesch-kincaid[2] values. You can see it in action on my blog: http://caffeineindustries.com/

I do think I should adjust the words per minute values...

[1] https://github.com/getpelican/pelican-plugins/tree/master/po... [2] http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readabil...

nstart 10 years ago

Curious as to why the reading time is estimated as 2 minutes and 96 seconds. Wondering if it isn't supposed to be 3 minutes and 36 seconds?

I plugged one of my own posts in and it read 3 minutes and 166 seconds (5 minutes and 46 seconds).

mklopetsOP 10 years ago

It's not showing x minutes AND y seconds, it's x minutes OR y seconds. Minutes is just there for someone who doesn't need precision and can let the API handle the rounding. Seconds is for people who want to do more advanced stuff.

tony-allan 10 years ago

I like the idea however it is confused by messy HTML such as: http://klopets.com/readtime/?url=http://www.nytimes.com/2016...

mklopetsOP 10 years ago

I'm aware of it, thanks. The current approach isn't really that flexible. For example, I've seen the NY Times and The Atlantic not working. I've considered some different potential fixes but haven't implemented them yet. Thanks!

krat0sprakhar 10 years ago

Nice demo! Is this code for this available online? I'd love to see how this works.

mklopetsOP 10 years ago

This is just that - a demo right now. I'll most likely keep this as a pet project of mine, trying to make the underlying algorithm a bit more intelligent. I'll then probably release it on GitHub (https://github.com/mklopets).

hswolff 10 years ago

FYI: Made a similar npm module a while ago if you want this functionality locally: https://github.com/hswolff/read-time

mklopetsOP 10 years ago

Thanks for sharing! The point here isn't just calculating the reading time based on a WPM metric, it's fetching a remote page, analyzing it to find the main content and then doing the maths, among other stuff taking into account any images.

zapt02 10 years ago

A bit too simple for an API, reminds me of Fuck Off As A Service. http://www.foaas.com/

tony-allan 10 years ago

Simplicity is a good thing. A service that attempts to do one thing well.
- fiatjaf 10 years ago
  
  Not when the HTTP request itself is more complicated to do than actually reimplementing the thing in Javascript.
  NOTE: If you are using a library for easily doing HTTP requests, than you can probably use a library for estimating time to read.
  - tony-allan 10 years ago
    
    I like the notion that a service can continue to improve under the covers without me needing to do software updates to get a new version of library.
    I know you can just include an external JavaScript library but I only do that for sources I trust.
    
    fiatjaf 10 years ago
    
    The service can also be discontinued. And that is what mostly often happens. Big improvements like what you're imagining are rare.

alistproducer2 10 years ago

We've officially reached Peak API. What's the algo?

Settings

Show HN: Estimated Reading Time API

Keyboard Shortcuts