Settings

Theme

Ask HN: How would you explain an ABI?

6 points by dankoncs 4 years ago · 10 comments · 2 min read


This would be my (possibly wrong) explanation:

So in Linux we have syscalls such as write, read, close, socket, ... These syscalls are defined according to the POSIX standard. POSIX is basically a standard for operating systems. Linux/Unix follow these standards. So syscalls are an interface so that we can communicate with the OS. (Is it the application binary interface already?)

Now, we have the C standard library, so we don't need to deal with syscalls (or specific OS stuff). The C standard library is basically an application programming interface (API). So printf, fopen etc. map to write, open, etc. respectively:

printf ~ write

fopen ~ open

fclose ~ close

...

Then, apparently, we require a runtime environment (RTE). This runtime environment (is perhaps) the standard library that is a .so file (a shared object). Shared objects are (perhaps) loaded by the OS to a specific memory location so that all C programs can access that .so file (which is now loaded in memory for all C executables to use.) So that .so file is the "RTE" and the syscalls are the "ABI"?

If so, I find "ABI" and "RTE" confusing. Java has an RTE. You need the Java Virtual Machine (an abstract CPU essentially) to run your .class files within that virtual machine (JVM).

Again, I am unsure about this, so please go easy on me. Thank you!

PS: Why do people like Jason Turner say: "break the ABI to save C++"? So break the .so files for C++'s standard library? What for? (Shouldn't it be: "break C++'s RTE"?)

retrac 4 years ago

ABI is not quite the same as API. While an ABI is an API not all APIs are ABIs.

Binary interface vs. programming interface.

An ABI is a precise low-level machine definition, bit level, of the register values, etc. to invoking the library functions. call function at 0x32918333 with 0xabcdef12. C++ has gotten terribly crufty in that regard. In practice there's no guarantee that if you link together something from multiple compilers with different flags, that it'll work. ABI incompatibilities and flaws. Though if you recompile it all from scratch you're fine -- it's all the same API.

Stuff like "break the ABI to save C++" means recognize the above truth, and sweep the current stuff away and reach a consensus on an actual stable ABI we can live with.

  • dankoncsOP 4 years ago

    Okay, let me give another try for this:

    So then we have 2 APIs. That is, we have the C standard library (as an API) and the POSIX API. The POSIX API defines the syscalls such as write, read, open, ... The C or C++ standard library provide the header files (function declarations etc.) and its .so file (the shared library) is in the memory. Take the Microsoft C++ Compiler as an example. I cannot see how the standard library is implemented, because only the headers are defined and the actual compiled code "lives" as a .so file in memory. This .so file as an object in memory is accessible to all C/C++ applications. The .so file is essentially C's runtime environment (RTE). (Also, C and C++ are standardized so no matter who implements the compiler and the standard library, it has to follow the language standard.)

    The standard library among other things is not only providing a wrapper for syscalls such as malloc and printf (POSIX API), it is also providing some useful functions or algorithms such as qsort, std::transform... Furthermore, data structure or containers can also be part of a standard library: std::vector, std::unordered_map, ...

    Now, if we compile a C or C++ program, the ABI is basically a set of definitions/rules, that the compiler has to abide to. Meaning things like how function parameters are stored (stacked or in registers), how a function should be called, how arguments are passed, how operations are mapped to machine code etc. But not only is an ABI a mapping or layout between C instructions and machine code, it is also a mapping between OS syscalls and C or C++ code.

    Exception handling in C and POSIX is basically this:

    #include <stdio.h>

    #include <setjmp.h>

    int main() {

      jmp_buf env;
    
      if (!setjmp(env)) /* try - something might go wrong in the subsequent code block (try block) better bookmark this line */
    
        longjmp(env, 1); /* throw an exception trigger the catch (else) block by going back to that bookmarked line */
    
      else /* catch */
    
        fprintf(stderr, "Yikes, something went wrong! ;)\n");
    
      return 0;
    
    }

    So if I do exception handling in C++, then the ABI of C++ should follow an exception handling routine. For example, virtual functions in C are basically this:

    https://godbolt.org/z/x94cdb1Y5

    So vtables are structs of function pointers. The C++ compiler has to follow some ABI convention (namely, some rule how to map or translate a vtable implementation such that it corresponds to the C++ code.)

    - syscalls ~ POSIX API (operating systems API) ~ write, read, open, ...

    - C/C++ standard library (.so files loaded in computer memory) ~ wrapper for syscalls (printf, malloc, ...), containers/data structures/algorithms (std::vector, std::unordered_map, qsort, ...)

    -ABI ~ rules for the compiler that tell how vtables, function calls, exceptions should be implemented

    Correct so far? Or am I still wrong?

karmakaze 4 years ago

The basic difference is that the API is a specification at a level higher than the hardware, e.g. source language. An application binary interface specification is at the machine code level. Often compilers, linkers, and operating system conventions all all tied together and the libraries and calling programs follow the same rules.

Note that there can be more than one. Typically the C calling convention was commonly used where the caller pushes data onto the stack, calls a function, and upon return the caller pops the data it pushed. The advantage of this is for variable length argument lists there's less chance of mismatch. A 'pascal' convention was popular on Windows and OS/2 where the called function would use a single instruction to return and pop N bytes. It also made the code slightly smaller as the callers didn't need the pop instructions. That ABI was specific Windows and OS/2 GUI interfaces but compilers would also let you choose for your own libs--I don't know/remember what the kernels used.

ktpsns 4 years ago

The term "API" is similarly fuzzy as the term "constant". Whether a variable or "storage bucket" is constant (vs dynamic) dependens heavily on the context: In many contexts there are for instance compile time constants vs. runtime constants.

In similar respect, you can refer to the operating system kernel interface as an "API". Some folks even call network protocols an API. In my humble opinion, in the end, it is just words.

gtirloni 4 years ago

I also have a hard time understanding abstract things without clear examples. I think that's the difficulty you're having?

This SO question has many answers that are a bit more concrete with the examples: https://stackoverflow.com/questions/3784389/difference-betwe...

phendrenad2 4 years ago

Syscalls are a type of ABI. But mostly people are talking about higher-level shared libraries.

ABI is a general term. The keyword is "binary". One chunk of compiled binary code wants to call a function in another chunk of compiled binary code. How does it do it? The OS has loaded both into the same memory space (perhaps a compiled C++ program and a .so file it depends on). How does the C++ code call functions inside the .so file?

It all comes down to placeholders. When you compile a C++ program that relies on the C++ standard library (say, io::cout), and compile it, gcc will create an ELF file. A few key things: The ELF file will have a header that says "I need the C++ standard library. Also, I make calls to these C++ standard library functions. There are placeholders there for now. Please fill in these gaps when loading me."

Your OS, when it runs your C++ program, will see that header, and go load the C++ standard library .SO into the same memory space. Then it will find the parts of your code where you call the standard library, and replace the placeholder with the address of the standard library function, in the .SO file.

That compiler/OS magic that happens behind the scenes is sometimes considered the mechanism of ABI. C++ has even more magic (since we have virtual functions and stuff). When you run a C++ program, before your main() even runs, a chunk of predefined C++ code runs, which loads vtables and stuff (IDK, I'm not a big C++ programmer).

"Break the ABI to save C++":

This video points out that this magic isn't really magic enough. C++ has a simplistic module loading mechanism, and if you compiled your C++ program to use Module v1, using Module v2 is likely impossible without recompiling your C++ code. This means that your OS needs to stock all versions of the library (ever run into Linux "error while loading shared libraries: libavformat.so.53: cannot open shared object file: No such file or directory" errors? That's what's going on. v53 of that library is missing. Even if you have v54, you can't use it with this program.)

tifkap 4 years ago

A ABI is the low-level implementation of any programming interface, no matter if it uses interrupts (0x80), sysenter, or long jump to code that the OS put there, etc, etc..

It is the nuts and bolts that do the low level real work of implementing a call / interface.

  • dankoncsOP 4 years ago

    Concise and good explanation. :)

    So the syscalls of a Linux/Unix machine are the same, b/c of the POSIX API. The POSIX API is a standard for *nix OSes.

    Now, we have compilers such as gcc, clang and Microsoft's C++ compiler. Do they decide on their own ABI (specification of how things should be implemented in the lowest levels) then?:

    https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html

tifkap 4 years ago

ABI is what you use when a Papa process and a Mama kernel love each other very much.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection