Settings

Theme

Ask HN: What is hand-coded assembly language used for these days?

32 points by bkovitz 16 years ago · 59 comments · 1 min read


What is hand-coded assembly language used for these days?

To put that another way, in the current marketplace, what kinds of program are so worthy of optimization that it's economically sensible to have a human spend several days hand-tuning machine language to squeeze out every CPU cycle?

chadaustin 16 years ago

IMVU hand-rolled its SSE skinning loops and parts of the software 3D lighting code, because only 2/3 of our customers have GPUs. We need to run well on five-year-old Dells with Intel graphics. (Direct3D on Intel isn't as good as a dedicated software renderer. We chose RAD's Pixomatic.)

In addition, look at how popular netbooks are becoming. The Intel Atom is an in-order CPU. Imagine a hyperthreaded, 1.6 GHz 486...

On the iPhone it's even worse. It's got a decent vector unit, but the CPU is very slow. You'll see great wins by doing your 3D math yourself.

As we continue to become multicore, I could imagine somebody shaving a couple cycles out of the core message passing routines, though you're almost certainly bus bound in those situations...

Computers are getting smaller and people want more out of them; assembly language is back in style!

  • chadaustin 16 years ago

    Oh, I forgot about another huge case for writing assembly language... These days, we have tons of different languages talking to each other, all in the same program. In Mozilla, for example, JavaScript talks to C++ objects via XPCOM/XPIDL. Since the C++ objects expect data laid out on the C stack in a certain order and JavaScript has no notion of the C stack, there is a bit of platform-specific assembly code in the middle that takes the JavaScript values, places them on the stack, and jumps into the C++.

    I'm guessing that most languages with built-in foreign function interfaces (like Python's ctypes) have similar thunking layers.

luu 16 years ago

I did some assembly optimization for an internal RTL-level simulator. We had ~1000 machines on a three year upgrade cycle, i.e., we upgraded 333 machines / year = $333k / year. Lets say I cost the company $200k / year. Several days = perhaps $2k, so I'd only need to get a .6% speedup for it to be worth it, not even including the cost of powering and maintaining our machines.

When I worked on it, our simulator was an order of magnitude faster than commercially available simulators (Synopsis VCS and Cadence NC-Verilog), which cost between $1k and $10k per license per year. I worked for a tiny hardware startup; established hardware companies use a few orders of magnitude more compute power than we did, so the equation is probably at least four orders of magnitude further in favor of doing assembly optimization in a commercial simulator.

  • bkovitzOP 16 years ago

    Thanks! Especially for the numbers. I had not even heard of RTL simulation before. Wow, extremely cool.

DarkShikari 16 years ago

Anything that's worth spending time to do fast is worth spending time writing SIMD assembly for.

You can get 5x, 10x, 20x, or more performance increases just by using the vector instructions given to you by the CPU. Until a magic compiler appears that can make proper use of them (read: never), hand-coded assembly will be critical for almost any application for which performance is critical, especially multimedia processing.

  • gruseom 16 years ago

    A compiler can't be made to make proper use of vector instructions? Why is that?

    • DarkShikari 16 years ago

      It is an extraordinarily difficult problem to transform scalar code into vector instructions. The only way to get even passable output from a vectorizing compiler is to write the code as vectors to begin with, such as with cross-platform assembly tools like Orc.

      And even then you'll often end up significantly worse off than if you wrote the assembly by hand.

      A run of Intel's compiler on the C versions of our DSP functions resulted in a grand total of one vectorization, which was done terribly, too.

      • jrockway 16 years ago

        The problem is that you used C, which doesn't have any syntax to represent meta-information about the problem you're trying to solve. When you write out C code to, say, add a list of numbers, it's hard for the compiler to optimize that. But it's very easy for the compiler when you tell it "sum this list of numbers".

    • JoachimSchipper 16 years ago

      As I understand - and I may be mistaken - the problem is that it's very hard to make sure that the emitted code is correct in the presence of raw pointers à la C.

      I am told that Fortran does better than C here; there is a reason it is still widely used in the scientific computing community, after all.

      This is also part of the reasoning behind C99's new restrict keyword.

    • angelbob 16 years ago

      In many cases it can, but then you have to be sure that it keeps doing so. You can write test cases for that, but in many cases it's easier to verify by just using the instructions directly.

      If you don't write the compiler, you have to make assumptions about when and how it can/will use those instructions, and often you assume wrong, particularly across compiler upgrades.

  • ntoshev 16 years ago

    Well, you can still use C with GCC and Intel intrinsics I guess.

    • DarkShikari 16 years ago

      Intrinsics are still worse (performance-wise) than hand-coded assembly, for two reasons.

      The first reason is the whole category of optimizations that the compiler is worse than a human at (like register allocation) or cannot do effectively at all (messing with calling conventions, computed jumps).

      The second reason is more subtle: in any case that you abstract yourself from some part of a problem, you inherently create a less efficient solution.

      For example, intrinsics mean that you don't have to manually allocate registers. But this also means that if your algorithm uses too many registers and it would be more efficient to modify it to require fewer (and thus not need spills), you will have no way of knowing such a thing. By insulating yourself from that layer of complexity, you've also limited your ability to make higher-level optimizations that improve lower-level performance.

      This applies on practically every level possible: any method of abstraction, no matter how well designed, will always in some fashion reduce the maximum performance you can achieve. Of course, this doesn't mean abstraction is bad--it provides an often-useful tradeoff between developer time and performance.

      • ntoshev 16 years ago

        I agree with your arguments, but they apply to non-vector code as well. Perhaps higher level programming using the right primitives is good enough in most cases.

a-priori 16 years ago

Signal processing algorithms on the phones made by a certain company I worked at are mostly written in assembly. The cellular protocols, at least those that use time-division (e.g. GSM), have strict real-time constraints, but mostly they use assembly because every microsecond you can shave off those algorithms is a microsecond you can sleep and conserve power.

maryrosecook 16 years ago

Joshua Block, Chief Java Architect at Google, says in Coders at Work:

"But for the absolute core of the system—the inner loops of the index servers, for instance—very small gains in performance are worth an awful lot. When you have that many machines running the same piece of code, if you can make it even a few percent faster, then you’ve done something that has real benefits, financially and environmentally. So there is some code that you want to write in assembly language."

DCoder 16 years ago

Debugging and reverse engineering games.

When publishers/developers don't give a bleep, the fans take up the task of fixing the bugs themselves. I happen to run one such project in my spare time (for C&C: Red Alert 2), and it's amazing how much stuff is broken. It's not as "serious" as other projects mentioned here, but still a reason to know ASM. (And a good way to see bad programming practices in action :) )

  • smiler 16 years ago

    People are still playing C&C: RA2?!

    • listic 16 years ago

      Come on, people are playing all kinds of games!

      I'm sure people still play Master of Magic, a strategic game from 1994. I've been playing it on and off since it came out, and I began to think - is it something wrong with me that I like this old game so much? I mean, surely there must be newer games that are better. I showed it to my teenage brother in the mid 2000's and he loved it too. I had my non-computer-gaming friends blown away by the original Heroes of Might and Magic (1995).

      I think the world needs better means for preservation of old computer games.

andrewf 16 years ago

Going the other way around - can anyone think of an open source project that could benefit from some assembly optimization, and isn't? I'd love an excuse to play with this stuff in a useful fashion.

(I love what I do, but my twelve year old self would be disgusted that I'm not writing games.)

  • jermy 16 years ago

    video playback and compression is sufficiently cool, and could benefit from optimisation. Have a look to see if any help is needed in the ffmpeg tree at the moment.

bilbo0s 16 years ago

Medical Imaging and Oil Exploration. A lot of the really fast packages are using ARB Assembly instead of GLSL to minimize the number of instructions per voxel. It adds up if you are doing 4D imaging in real time for instance.

nvoorhies 16 years ago

In addition to the optimization reasons, you also end up coding assembly by hand to tickle features in the verification and bringup of new processors and/or processor architectures.

Since a lot of the bugs therein may be dependent on a certain sequence of instructions, doing it in a high level language doesn't make any sense.

reedlaw 16 years ago

Microcontroller firmware. There are many examples of AVR code in assembly on the web. I learned assembly this way. It really makes sense when you're working on bare hardware with no abstraction layers in the way. Also, it's useful for time-critical applications such as creating video signals or audio processing.

angelbob 16 years ago

GPGPU stuff -- that is, using your graphics processor for random programming tasks. While something like <a href="http://www.nvidia.com/object/cuda_home.html>CUDA</a&...; reduces the need to write assembly-like code, it also reduces the available speed substantially.

For that matter, CUDA (and ATI's Bare-Metal Interface, which is similar) is more assembly-like than C-like in many ways. So even using the higher-level available language is still pretty much like assembly.

You tend to only write these things when you're going to be running a lot of elements through, so almost everything you do in these platforms is inner-loop, or you'd be using a different tool. So even small speed-ups tend to matter.

mfukar 16 years ago

Contrary to popular belief, assembly is not only used for performance. Ask the security industry for more info.

  • tptacek 16 years ago

    That has more to do with reading assembly than writing it by hand.

    • JoachimSchipper 16 years ago

      There are certainly people who write shellcode. As I understand it, people have written shellcodes that use only bytes that happen to map to ASCII, are obfuscated to bypass intrusion detection systems, and so on. I'm sure it requires quite a bit of (specialized) knowledge.

      • tptacek 16 years ago

        That was more common in the late '90s than it is now (and note that it involves knowing only a very few instructions; enough to call a function or the system call gate).

        There are occasional exploits that can't be pieced together out of other people's shellcode, but there are also perhaps 10 people in the world that write those exploits.

      • lallysingh 16 years ago

        Well, more like bytecode that doesn't contain a zero-byte, which'd stop a string dead-on.

        • tptacek 16 years ago

          In '96 when I wrote the Crispin IMAP server bug, I can't remember which way it was but you either couldn't have uppercase letters, or could only have uppercase letters, in the shellcode. I thought I was kind of badass for writing that code. Of course, by '99, that was a triviality.

          Just saying, it's not just NUL.

    • mfukar 16 years ago

      Actually, no. If you're dealing with obfuscation, IDS and antivirus evasion etc. you need to know how to read, write and otherwise manipulate assembly code (debug, self-modification, name-it).

      • tptacek 16 years ago

        First, for dealing with obfuscation and evasion, you're reading, not writing.

        Second, for every 100 people that talk about e.g. self-modifying viruses or shellcode, there is perhaps 1 person who can actually write something soup-to-nuts, and maybe 5-10 more who can modify that code to make it do something new.

        Reading assembly is important for security research. Writing, not as much.

rythie 16 years ago

It's used in bits of kernel programming - often because there is no other way to do the task.

  • tptacek 16 years ago

    Modern kernels have very little assembly, outside of things like locore. They've heavily abstracted away the things you'd normally write in assembly, like modifying MSRs; also, so much of what you do now is simply memory mapped.

    In all of xnu, not counting AES, there are ~17kloc in x86 assembly, most of it in osfmk/i386 --- where no normal developer is ever going to go. There are over 730kloc in C.

    • rythie 16 years ago

      I wasn't implying there was a lot, just that sometimes that's the only way.

      • tptacek 16 years ago

        I'm just saying, even as a kernel dev, you're unlikely to need to write things in assembly.

Locke1689 16 years ago

I work in virtual machine development, so a portion of the interface code for hardware virtualization I wrote in straight ASM. This is not (exactly) for speed reasons, though; it's just impossible to touch the hardware at that level in C. :)

daeken 16 years ago

Compiler intrinsics, binary patches and hooks (although EasyHook has made assembly a rarity here outside of the occasional shim where odd calling conventions are used), in-process debuggers, low-level bootloaders, hardware initialization/management, various thunking mechanisms.

Others have covered the optimization side of things well so I won't repeat it, but there are tiny fragments of assembly all over the place -- they hold your system together.

vabmit 16 years ago

I've done it for cryptography code and cryptanalysis code. Specifically, optimizing code to take advantage of specific instructions available in certain processors or to make use of vector registers and instructions. I wrote my programs in C and then went back and wrote assembly for parts of the code that could deliver a significant overall speedup with hand optimization.

One place I did this was various RSA Challenge attack clients.

bkovitzOP 16 years ago

Thanks to all for the many informed and detailed replies!

I am now assistant-teaching a college course in low-level computer programming. It's an excellent course: the students reprogram a children's toy robot that uses the ARM processor. http://www.amazon.com/Little-Tikes-Giggles-Remote-Control/dp... They're getting up to speed very quickly on how to get hardware to actually do stuff.

Yes, I actually left Silicon Valley to do grad school. I haven't given up the principle of "do real stuff, see real results", though. I'm looking to design a couple fairly small homework assignments consisting of optimizing some ARM code. I want the examples to be real. Now mulling over which to do...

gte910h 16 years ago

Lots of time small embedded programs, especially on underpowered micro's, see this sort of attention.

Additionally, low level hardware interfacing is often done with hand coded assembly, because it is easier to "get right" on some crappy compiler toolchains that you face, then C.

tptacek 16 years ago

Debugging and performance monitoring.

jonah 16 years ago

Inner loops of graphics algorithms - Picasa, Photoshop, etc.

  • onewland 16 years ago

    Yes.

    Tangentially related but not quite the same, I work at a company that makes barcode recognition software and some of our most performance-sensitive areas use assembly. It is mostly C, though.

scumola 16 years ago

I've re-written many perl things in C to speed up processing time - for me, it's better than buying more/newer hardware. Also, for smallish scripts that I invoke millions of times or a small perl script that does regexps, I can rewrite those in C to boost speed as well. I don't do any ASM code anymore, but C is a really good optimization step for me and my projects.

brg 16 years ago

Supporting backwards compatibility in vtable lookups. This is becoming increasingly important with COM.

nirmal 16 years ago

Not a marketplace use, but the most recent use I've had for Assembly was in an Atari 2600 programming class.

http://nirmalpatel.com/hacks/atari.html

njn 16 years ago

Some weirdos just plain like it more than high-level languages. One of those weirdos is developing the Linoleum language: http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpW...

  • zokier 16 years ago

    I, for one, think that Assembly is just beautiful in its simplicity.

    • daeken 16 years ago

      Having read and written assembly on a daily basis for years, I have to disagree entirely. The only simple thing about assembly is that it happens to map to machine code directly, but macros and quasi-instructions even make that iffy. There are so many idiosyncrasies in every ISA, so many ways in which the code you write has side effects. Assembly isn't just complex in practice, it's complex in concept.

      If you want simplicity, you look at lisps; homoiconicity is perhaps the most elegant, simple concept known in computing. It may be more complex in practice (many more layers above the bare metal), but in concept it's simply beautiful.

      • axod 16 years ago

        Try ARM. x86 assembly is ugly and wart ridden. ARM is like a breath of fresh air. Unbelievably well designed.

        • bensummers 16 years ago

          And amusingly it's possible to write some significant desktop software, just in ARM assembler.

          http://www.cconcepts.co.uk/products/publish.htm http://www.cconcepts.co.uk/products/artworks.htm

          Although I wouldn't recommend choosing assembler for a destkop app, but it made sense when they were written. And I still think Impression beats many a word processor and DTP package available today, and it ran in 2MB of memory with no hard disc.

        • daeken 16 years ago

          I learned ARM while working on iPhone reversing, and it's certainly nicer, but there are still a ton of considerations. It's much nicer to write, but when reversing it you have to handle so many edge cases it's not even funny. Writing a decompiler for it really drives that home.

        • bkovitzOP 16 years ago

          Testify!

          15 years ago, I did Intel-style assembly--and loved it even with all its clumsiness. But I just dove into ARM a few weeks ago, and am loving it even more. Such sweet pleasure to code so close to such a beautiful and simple machine!

        • loup-vaillant 16 years ago

          Stack-based machines are even simpler. Too bad Forth didn't took of.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection