How to analyse text in Python?

3 min read Original article ↗

Waqas Younas

Press enter or click to view image in full size

Photo by Aaron Burden on Unsplash

I am interested in writing. I write a blog, and sometimes op-eds on issues I am passionate about. I am not a native English speaker, so I am always wanting to learn more to get better at writing.

A couple of months back, I finished reading Steven Pinker’s book entitled “The Sense of Style.” Around the same time, I read a research study published by Stanford Literati Lab. Both these resources gave me insights on how to write cogently and lucidly, and these insights inspired me to write a text analyzer to help improve my writing.

Eventually, I created a text analyzer in Python. It is called Homer, and is available on Github for free. I had been finding it useful, so I decided to open-source it, thinking others may find it useful, too. Homer provides overall text statistics as well as paragraph-level stats.

The text stats include the following:

  • The length of time, on average, required to read the article.
  • A readability score (using both Flesch reading ease and Dale Chall readability algorithms).
  • The total number of paragraphs, sentences and words.
  • Frequency of the word “and” in the text (the less the better).
  • List of vague words in the text.

The paragraph-level stats include:

  • The number of sentences and words in the paragraph.
  • Average words per sentence.
  • The longest sentence in the paragraph.
  • Readability scores (Flesch reading ease and Dale Chall readability scores).
  • Warning if the number of sentences exceeds a certain threshold in a paragraph.
  • Warning if the number of words exceeds a specific limit in sentences.

I use Homer as a guide to write as clearly as I can. The idea is not to let Homer control your writing and limit your creative freedom. I do not use it that way. For example, Homer warns when it finds longer sentences. But an essay in which all sentences are short could be displeasing to both mind and ear, so in some cases, I will ignore Homer’s warning. But I have formed a habit of reading all its stats, pausing and reflecting on everything it is highlighting.

I hope you find my labour of love useful. If you do, please do not forget to leave feedback or like the project on Github by starring it. Thank you in advance.

It is when people like an open-source project, as I now realise, being an author of one, that one is given encouragement and unbounded joy.

Thank you!