The Origin of the Terms Big-Endian and Little-Endian (2003)

ling.upenn.edu

23 points by cluckindan 7 days ago · 22 comments

Reader

The bug is in the positional number system. It would make much more sense to add digits to the right and left-align everything.

Hex dumps would make sense, columns in spreadsheets would be left aligned always, and so on.

But we're too late for that.

gnabgib 7 days ago

Related: Note 127 https://www.rfc-editor.org/ien/ien137.txt

jmclnx a day ago

IIRC, Mark Williams Company had a patent on the ability to run the same binary on both big and little endian machines.

https://patents.justia.com/assignee/mark-williams-company

https://en.wikipedia.org/wiki/Mark_Williams_Company

I wonder what happened to it and why could Linux not use it. I think it was released under a free licence.

rep_lodsb a day ago

That just describes using conversion routines for data, after loading and before storing to some standardized on-disk / network format.

RcouF1uZ4gsC 2 days ago

I think there is peace now because little-endian won.

All modern CPU are little endian (or dual selectable)

Other than backward compatibility, there is non need for little endian.

mwkaufma a day ago

Network byte order.
- dontlaugh a day ago
  
  Which will also become a historical artifact as new protocols are made to use little endian.
  - nineteen999 a day ago
    
    For which protocols? TCP/IP itself is network byte order all the way down to the bottom of the jar.
    
    dontlaugh a day ago
    
    For example, Cap'n Proto and QUIC are both little endian.
    TCP is becoming increasingly less relevant, although I don't know if it'll ever actually disappear.
    
    nineteen999 a day ago
    
    Capn Proto and QUIC are are layer 6 and 7 (presentation and application protocols respectively). Quic is built on top of UDP.
    Layers 3-4 (network, transport) are both big-endian - IP packet headers and TCP/UDP headers use big-endian format.
    This means you can't have an IP stack (let alone TCP/UDP, Quic, Capn Proto) that's little-endian all the way through without breaking the internet.
    Outside the webdev bubble, it's pretty much QUIC that is irrelevant - it's just another UDP based application protocol.
    
    dontlaugh a day ago
    
    UDP is an implementation detail of QUIC, just a way to give IP-ish functionality to userspace. In practice, QUIC is a TCP alternative.
    The OSI layer model is not necessarily as relevant as it used to be.
    
    nineteen999 18 hours ago
    
    You're kind of saying "look over here!" but I'm not that easily distracted. You said "Which will also become a historical artifact as new protocols are made to use little endian". It's never going to become a historical artifact in our lifetimes. As the peer poster pointed out, QUIC itself has big-endian header fields. IPv4/IPv6 both use big-endian at layer 3.
    The OSI layer model is extremely relevant to the Cisco network engineers running the edges of the large FAANG companies, hyperscalers etc. that connect them to the internet.
    
    dontlaugh 18 hours ago
    
    I was wrong about QUIC, for some reason I was sure I'd read it's little-endian.
    I'm just pointing out that UDP is an extremely thin wrapper over IP and the preferred way of implementing new protocols. It seems likely we'll eventually replace at least some of our protocols and deprecate old ones and I was under the impression new ones tended to be little endian.
    
    Veserv a day ago
    
    Foolishly, QUIC is not little-endian [1]. The headers are defined to be big-endian. Though, obviously, none of UDP, TCP, or QUIC define the endianness of their payload so you can at least kill it at that layer.
    [1] https://www.rfc-editor.org/rfc/rfc9000.html#name-notational-...
    
    dontlaugh a day ago
    
    Oh really? I must’ve misread.
- flohofwoe a day ago
  
  Modern-ish CPUs have instructions to load big-endian data without having to switch into a special 'big-endian mode', and compilers can optimize into those instructions so language don't need to add special intrinsics:
  https://www.godbolt.org/z/q3hMPq78v
  ...but even without specialized instructions the transformation should be pretty much free on pipelined CPUs (compared to a memory load anyway).
  - rep_lodsb a day ago
    
    It's still one more thing you need to keep in mind when writing code, at least in languages that don't have a separate data type for different-endian fields.
- Gibbon1 a day ago
  
  Yeah people need to stop doing that going forward. It makes driver code a royal pain in the ass.
weebull a day ago

There's one area I wish we did differently which I think is a hang-over from big-endian. It's the order of bytes when we write out hex dumps of memory.
You'll always get something like this:
``` 00000000 : 00 01 02 03 04 05 06 07 00000008 : 08 09 0A 0B 0C 0D 0E 0F ```
On a big-endian machine, when you wrote 0x1234 to address 0x0000000 you got:
``` 00000000 : 12 34 02 03 04 05 06 07 00000008 : 08 09 0A 0B 0C 0D 0E 0F ```
On a little-endian machine you have to either do mental gymnastics to reorder the bytes, or set the item size to match your data item size.
``` 00000000 : 34 12 02 03 04 05 06 07 00000008 : 08 09 0A 0B 0C 0D 0E 0F ```
If we wrote the bytes with the LS byte on the right (just as we do for bits) then it wouldn't be an issue.
``` 00000000 : 07 06 05 04 03 02 12 34 00000008 : 0F 0E 0D 0C 0B 0A 09 08 ```
- rep_lodsb a day ago
  
  !tfel-ot-thgir eb dluow ,redro dleif sa llew sa ,sgnirts lla neht tuB
  It could be argued that little endian is the more natural way to write numbers anyway, for both humans and computers. The positional numbering system came to the West via Arabic, after all.
  Most of the confusion when reading hex dumps seems to arise from how the two nibbles of each byte being in the familiar left-to-right order clashes with the order of bytes in a larger number. Swap the nibbles, and you get "43 21", which would be almost as easy to read as "12 34".
  - Veserv 18 hours ago
    
    Yep. We even have a free bit when writing hex numbers like 0x1234. Just flip that 0x to a 1x to indicate you are writing in little-endian and you get nice numbers like 1x4321 that are totally unambiguous little-endian hex representations.
    You can apply that same formatting to little-endian bit representations by using 1b instead of 0b and you could even do decimal representations by prefixing with 1d.
  - Gibbon1 18 hours ago
    
    For me I think the issue is the way you think of memory.
    You can think of memory are a store of register sized values. Big endian sort of make some sense when you think of it that way.
    Or you can think of it as arbitrarily sized data. It's arbitrary data then big endian is just a pain the ass. And code written to handle both big and little endian is obnoxious.

Settings

The Origin of the Terms Big-Endian and Little-Endian (2003)

Keyboard Shortcuts