Relative frequency of SI units and prefixes – Negative Feedback

3 min read Original article ↗

I am huge a fan of the SI system, but I’ve often noticed that even my academic colleagues take a mix-and-match aproach to the BIPM standards. It is common for chemists to use Ångstrom over nanometre, and I don’t think I’ve ever seen anyone use megagram over tonne.

Perhaps people are so used to mg (miligram) that they instinctively feel that Mg (megagram) would be confusing, and I admit Gg looks a bit weird. Is that a trend? Do people avoid repeated letters in their unit names? Mm (megameter) does feel a bit odd, although mm (milimeter) feels much more familiar.

These questions drove me to quantify the relative frequency of the SI units and prefixes.

Google search results

I used Google search results as a proxy for usage. I also tried the Google Ngram project, but that turned out to be harder than I wanted it to be. After downloading all the data and parsing some of it, it didn’t contain all the terms I wanted.

Here is the complete table showing log10(Google search results + 1) for every combination of the SI prefixes with the units mentioned in the SI brochure as base units or derived units. I’ve skipped degrees Celsius as the Google search API kept erroring on searches like “quettaCelsius”.

It’s always interesting when you get results like this – you can immediately start seeing some structure.

These things jump out at me:

  • quecto, ronto, ronna, and quetta were added in 2022 and you can clearly see they haven’t been as widely adopted as the other prefixes
  • No-one uses prefixes with steradian
  • The “-er” spellings of metre and litre are more popular than the “-re” spellings that the BIPM uses, but not by much (I’m not sure if this is an artefact of Google combining results for the two)
  • For tonne, the negative power prefixes (milli) are much less popular than the positive ones (mega).

Basic stats

Here are the counts across all prefixes for the units. It’s obvious that the base unit counts are somewhat complicated by common words like “gray” or names like “Henry”. I’d imagine Pascal and Newton have the same problem.

Then there are the prefixes, which I’ve kept in their power order.

I was a bit surprised at how unpopular deca/deci are. I think there is just a strong preference for powers of three in the “engineering units”.

Tonne

Let’s dig in to more detail for the mass units (gram and tonne). I’ve plotted equal masses on the same vertical height to compare. I was surprised that only tonne meaningfully shadows megagram. On this view it’s also quite clear how infrequently people use tonne with negative power prefixes.

Spelling

The “-er” spelling is consistently higher than the “-re” spelling for meter/metre and liter/litre. The following chart shows the ratio of the spellings for different prefixes.

Coding information

The basics of the search code are pretty simple:

def search(term):
    response = requests.get(URL, params=BASE_PARAMS | {"q": term, "exactTerms": term, "num": 0})
    time.sleep(0.5)

    try:
        return int(response.json()["queries"]["request"][0]["totalResults"])
    except Exception:
        print(response.json())

Then a loop like

for prefix, unit in product(prefixes, units):
    searchterm = f"{prefix}{unit}"
    
    if searchterm not in db:
        db[searchterm] = search(searchterm)

Some details have been omitted from this code for brevity: db is a shelve database. I’m using requests for the API access and BASE_PARAMS contains the details for my Google Custom Search API engine, which means that I had to pay $2 of my own money to get the answers I was craving.