Perception of Probability Words

6 min read Original article ↗

Overview

It is common for you to find articles everywhere that use probabilistic words to describe events. Just a few examples I found in a quick internet search:

"'Highly unlikely' State of the Union will happen amid shutdown" – The Hill
"Tiger Woods makes Masters 15th and most improbable major" – Fox Business
"Trump predicts 'very good chance' of China trade deal" – CNN

A study in the 1960s explored the perception of probabilistic words like these among NATO officers. Curious on how this differs today, I asked my connections on social media to take a survey of their perception of the same probabilistic words studied by the NATO. Among the 123 people who responded, a simple visualization to show their perception.


Interact: Tap/hover over any circle to view a single person's response to every phrase.


Perceptions

In general, the perceptions that we hold of probabilistic words has changed very little since the studies in the 1950s. The first trend that is very clear when viewing individual responses in aggregate is that nearly everyone tends to choose probabilities that end in a 0 or a 5, like 20% or 85%. Among all 2,091 responses, 1,795 (85.8%) of the responses end in a 0 or 5 — we could say there is a very good chance your response will end in a 0 or 5.

A second clear trend is some probabilistic words have a narrower range of perception than other words. The box plot visualization shows a shaded box that includes the range of responses that encompass the middle-50% of responses. This middle 50% is known as the Interquartile Range or IQR. For example:

  • The IQR of about even is 0% (the box is a single line) — the middle 50% of all respondents perceive about even to be 50%.
  • The IQR of we believe is among the largest (the box covers 20%) — the middle 50% of all respondents perceive we believe to be between 65% - 85%.

The following table ranks every word by the width of the IQR:

Probability WordIQR

Middle 50%

About Even0.0%

50.0% - 50.0%

Almost No Chance4.0%

1.0% - 5.0%

Better than Even5.0%

55.0% - 60.0%

Highly Unlikely5.0%

5.0% - 10.0%

Almost Certain8.0%

90.0% - 98.0%

Likely10.0%

65.0% - 75.0%

Chances are Slight10.0%

10.0% - 20.0%

Little Chance10.0%

5.0% - 15.0%

Highly Likely15.0%

80.0% - 95.0%

Probable15.0%

60.0% - 75.0%

Very Good Chance15.0%

75.0% - 90.0%

Probably Not15.0%

15.0% - 30.0%

Probably15.0%

60.0% - 75.0%

Improbable17.5%

5.0% - 22.5%

We Believe20.0%

65.0% - 85.0%

We Doubt20.0%

10.0% - 30.0%

Unlikely20.0%

10.0% - 30.0%


Background

The first widely published work to begin to analyze the perception of probabilistic words was written by Sherman Kent while he was working for the CIA. Originally a classified work, "Words of Estimative Probability" was published in Studies in Intelligence in 1964. In this work, Kent outlined several key terms with a probability and a range and proposed the scale to be used by the CIA:

Kent's Work (1964)

Proposed scale for CIA officers

This Survey (2019)

Internet survey of primarily undergraduate students

Word

Words with the same "linguistic expression"

Probability

Proposed Range

Word Median

Middle 50%

Certain100.0%

100.0% - 100.0%

Almost Certain

Virtually Certain
All but Certain
Highly Probable
Highly Likely
Odds Overwhelming


93.0%

87.0% - 99.0%

Almost Certain95.0%

90.0% - 98.0%

Highly Likely90.0%

80.0% - 95.0%

Probable

Conceivable
Could
May
Might
Perhaps

75.0%

63.0% - 87.0%

Very Good Chance80.0%

75.0% - 90.0%

We Believe75.0%

65.0% - 85.0%

Probably70.0%

60.0% - 75.0%

Probable70.0%

60.0% - 75.0%

Likely70.0%

65.0% - 75.0%

Chances About Even

Chances about Even
Chances a Little Better than Even
Chances a Little Worse than Even
Improbable
Unlikely


50.0%

40.0% - 60.0%

Better than Even60.0%

55.0% - 60.0%

About Even50.0%

50.0% - 50.0%

Probably Not

We Believe that Not
We Estimate that Not
We Doubt
Doubtful


30.0%

20.0% - 40.0%

Probably Not25.0%

15.0% - 30.0%

We Doubt20.0%

10.0% - 30.0%

Unlikely20.0%

10.0% - 30.0%

Almost Certainly Not

Virtually Impossible
Almost Impossible
Some Slight Chance
Highly Doubtful

7.0%

2.0% - 12.0%

Little Chance10.0%

5.0% - 15.0%

Chances are Slight10.0%

10.0% - 20.0%

Improbable10.0%

5.0% - 22.5%

Highly Unlikely5.0%

5.0% - 10.0%

Almost No Chance2.0%

1.0% - 5.0%

Impossible0.0%

0.0% - 0.0%

Over a decade later, Scott Barclay et. al., working for the Advanced Research Projects Agency published a 285 page book entitled "Handbook for Decisions Analysis" for the Department of Defence. This work references Kent's work and a NATO study where "twenty-three [NATO] officers, ranking from squadron leader to lieutenant general" were asked about many probabilistic words. Specifically:

Several different sentences were constructed in the following manner. "It is highly likely that the Soviets will invade Czechoslovakia," or "It is almost certain that the Soviets will invade Czechoslovakia," or "We believe that the Soviets will invade Czechoslovakia." The basic structure of all sentences remained constant; only the-verbal qualifiers changed.

The results of this survey of NATO officers produced the first visualization that combined Kent's work and a human survey that is still widely available today. In presentation the visualization, Barclay comments: "Clearly, the readers in this experiment were not using the Sherman Kent scale even though they were familiar with it." Barclay's visualization was later recreated by others with modern typeset:

The work of assigning specific probabilities to probabilistic words has also been studied in other fields with uncertainty including metrology and medicine. For example, Bernie J. O'Brien's 1989 paper "Words or numbers? The evaluation of probability expressions in general practice" surveyed "communicating to patients the probability of a side-effect (headache) arising from an unspecified prescription medicine." O'brien's worked surveyed 52 general practitioners and used Spearman's rank correlation coefficient to determine a "ambiguity ranking" for each word and used a scatter plot to rank IQR range vs. ambiguity:

Probability ratings of 23 phrases by 52 general practitioners

Relationship between observed and predicted variability in meaning

This work was popularized online in 2015 when reddit.com user /u/zonination performed an internet survey similar to the survey done for this work. In zonination's survey, 48 users responded and uploaded the data, alongside a visualization made in R, to github and shared it on reddit. zonination's post won the 2015 Kantar Information is Beautiful Award and lead numerous articles across various websites and blogs:

  • "Here's how people view the difference between something being 'highly likely' and it being just 'probable'" - Business Insider
  • "Measuring Perceptions of Uncertainty" - Visual Capitalist
  • "Perceptions of probability" - SAS Blogs
  • ...and many others...

The area of human perception of probabilistic words continues to be an area of somewhat active research, with peer-reviewed academic papers published as recently as last year.


Data Set

Complete Data Set (CSV): https://github.com/wadefagen/datasets/tree/master/Perception-of-Probability-Words

  • Largest known open-source survey of probabilistic words (n=123).
  • CSV Format Details: Row 1 contains descriptive column headers; all other rows contain data.

A previous version of this page listed an incorrect publication date for "Handbook for Decisions Analysis" and has been corrected. Thanks to Dr. Charles Twardy for the correction.