Settings

Theme

TCP HTTP Server written in Assembly

canonical.org

154 points by thikonom 12 years ago · 73 comments

Reader

derefr 12 years ago

Cool stuff. Really, though, this is still relying on a rather large runtime library: the physical, data-link, and network-layer drivers.

Now what'd be really awesome to see, would be one of those Operating System guides that shows you how to write an OS kernel, in assembler, that can speak HTTP. Even just limiting yourself to targeting the synthetic hardware of a VM program, it'd still be quite a feat.

Bonus points if the entire network stack has been flattened using the hand-rolled equivalent of stream-fusion. :)

  • aortega 12 years ago

    Here you go:

    http://www.kyllikki.org/hardware/wwwpic2/src/wwwpic2.asm.htm...

    Bonus: runs on 68 bytes of ram. Not a typo, it's bytes, and it's a "complete" http+tcp/ip server.

    • mbell 12 years ago

      That is just running the TCP/IP layer speaking RS232 and relying on an external IC for lower layers. It's not at all what the GP is looking for.

      It should probably also be noted that a minimum TCP header, with no data attached, is 20 bytes, so to implement a 'full stack' in 68 bytes is a pretty strong indication that you're relying on off SoC memory to handle the packet buffering.

      • aortega 12 years ago

        I encourage you to read the source code instead of guessing how it may work.

    • csmuk 12 years ago

      Bet that handles TCP fragmentation well :)

      Nice work though to whoever crammed that in a PIC.

    • sillysaurus2 12 years ago

      Awesome. Can VirtualBox or VMWare run this?

      • cfallin 12 years ago

        From the first line of the source, it's a PIC (microcontroller) program, rather than something for x86 (which VirtualBox and friends emulate), so, no.

        There are a few open source PIC simulators -- e.g. [1] -- and I would guess you might be able to get it running, since the link layer is SLIP over a serial port. You'd just have to wire up the simulator's serial console in the right way.

        [1] http://gpsim.sourceforge.net/

  • kragen 12 years ago

    Agreed, and also the filesystem. I think you may be looking for lwIP and Contiki.

  • gwu78 12 years ago

    Are you saying you'd like to see string procesing moved into the kernel?

  • proksoup 12 years ago

    omfg yes!

neverm0re 12 years ago

Here's another simpler implementation of an HTTP server in Linux x86 assembly from last year, coincidentally by the one who did the Seiken Densetsu 3/Secret of Mana 3 translation hack and the old Starscream 68k emulator:

http://www.neillcorlett.com/etc/mohttpd.asm.txt

And a not so successful thread to go with it: https://news.ycombinator.com/item?id=4714971

  • kragen 12 years ago

    That's very nice! Thanks! I spent a lot of last night figuring out how to use socketcall, and this should be helpful.

kragen 12 years ago

I hacked on httpdito some more, and it has been improved in several ways:

- it now forks so that it can handle multiple concurrent connections (up to a limit of 2048);

- it no longer uses libc at all, so it's down to 2088 bytes (I had it lower, but then I added forking);

- it's less complex now that it only has one way of invoking system calls instead of two;

- there are some performance results in the comments.

- it has a name, "httpdito";

- strlen works correctly.

Probably nobody will read this comment here, but I thought it was worth mentioning.

  • kragen 12 years ago

    Down to 1928 bytes now, and has timeouts for robustness. You can still DoS it but it takes more work.

zw123456 12 years ago

Very cool I think. But cooler, Web Server on a FPGA, without CPU, only VHDL http://www.youtube.com/watch?v=7syu5EC1OWg

mappu 12 years ago

Cool!

My comments as an inexperienced assembly developer, assuming this is optimising for binary size:

- The pug/doN macros do an extra reg-reg copy if passed a register - and the recursive definition calls pop/pop/pop instead of just add %esp, -4*N, you could shave a few bytes

- AT&T syntax will always look weird to me, but the heavy use of macros and local labels is quite elegant

- A little bit of candid swearing in the comments? Fine by me, but is this officially associated with canonical?

  • aroman 12 years ago

    > - A little bit of candid swearing in the comments? Fine by me, but is this officially associated with canonical?

    Assuming you mean Canonical Ltd., the company behind Ubuntu, this has absolutely nothing with them — this is hosted on canonical.org, not canonical.com.

  • pbsd 12 years ago

    Agree, AT&T syntax was just not designed for human reading. I doubt this is too optimized for size, since there are obvious tricks that it misses.

    Another observation: the strlen code is incorrect, as it also counts the \0. We can fix this, and make the code 1 byte shorter (in glorious Intel syntax):

        lea esi, source        ; depends on source
        xor ecx, ecx           ; 2 bytes
        salc                   ; 1 byte
        cld                    ; 1 byte
        _back:
        scasb                  ; 1 byte 
        loopnz _back           ; 2 bytes
        not ecx                ; 2 bytes
    • kragen 12 years ago

      Thank you! About the time you wrote this, I discovered that the strlen code was incorrect in a different way as well, and then I went to sleep. Sort of embarrassing.

    • kragen 12 years ago

      BTW, I've fixed the strlen code (although differently). I didn't know about SALC! That's a very clever way of zeroing AL.

      I think at this point I might be able to get away with CLD since I never STD any more :)

      Some of the obvious tricks it misses are probably because they're not obvious to me, while others may be just because I haven't gotten to them yet.

      • pbsd 12 years ago

        Your way is much cleaner; mine was just a size gimmick. I just can't resist it :)

  • derleth 12 years ago

    > you could shave a few bytes

    This is practically axiomatic in assembly language programming.

    It's just not worth it to turn you code into what you'd need to turn it into in order to make it as small (or as fast) as it can possibly be on that specific version of that specific microarchitecture from that specific manufacturer, such work being undone by the next version of the hardware.

    > AT&T syntax will always look weird to me

    AT&T syntax is meant to be a generic assembly language syntax; it's supposed to look equally weird to everyone, regardless of what CPU they're writing code for. GAS will accept Intel syntax, or a somewhat heterodox variant thereof. NASM is the usual assembler of choice on modern x86 Unix-a-likes, I think.

    http://www.nasm.us/

    > A little bit of candid swearing in the comments?

    Hey, if the Linux kernel devs can do it, why not them?

tokenizer 12 years ago

As a web developer who isn't familiar with assembly or any web server more barebones than nginx, what benefits does something like this provide? Speed? Could this be a solution for an extremely simple directory/static file web server?

  • anonymouscowar1 12 years ago

    This is a simple, single-threaded single-process accept-read-respond-loop web server. It's vulnerable to trivial trickle DoS attacks and probably has other issues. There are no advantages, the author just did this for fun.

    The TCP part comes from C code in the kernel, so this headline is a little misleading ;-).

    • kragen 12 years ago

      Agreed. However, it should be safe from buffer overflows, path traversal attacks, XSS, and obviously CSRF. It should be fine other than DoS. Let me know if you find any exceptions.

      • anonymouscowar1 12 years ago

        It's hard to be vulnerable to XSS and CSRF with all-static content, no?

        So, not only will a trickle DoS other clients, each byte will also force an O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but that's not great.

        It looks like a request with no space could force you to walk (`repne scasb`) through invalid memory after $buf. Also maybe corrupt it (unescape_request_path).

        It will also fail to correctly parse HTTP/0.9 (not a big deal, but part of spec). The parsing code ignores the existence of verbs other than GET. (Doesn't check that the verb is GET either.)

        We don't validate that paths start with /, we just skip that byte. Okay:

                mov (path), %al
                ...
                cmp $'/, %al
                je badreq
        
        Since valid GETs are of the form:

            GET /foo.txt HTTP/1.0
                 ^-- path=buf+5
        
        As you point out, a client close will cause SIGPIPE causing a crash (DoS).

        That's all I see. But I'm not an asm expert and I'm sure I've missed something.

        • kragen 12 years ago

          > It's hard to be vulnerable to XSS and CSRF with all-static content, no?

          You would think, but actually Apache managed to be vulnerable to XSS by including bits of the request URL in its error paegs, if I remember right. Last millennium, I think.

          > So, not only will a trickle DoS other clients, each byte will also force an O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but that's not great.

          Hmm, while I hadn't thought about that, and I should have, I think that's probably okay; basically you're saying that you can get the machine to burn up to, say, 2048 cycles by sending it a small TCP packet. Which means that a 4-core 2GHz server machine can't handle more than about four million packets per second (well, one million until I parallelize), which is about 85 megabytes per second, or 680 megabits per second. There are probably other bottlenecks in the code, the kernel, or your data center that will kick in first. It's probably more effective to DoS the server by just requesting files from it.

          > It looks like a request with no space could force you to walk (`repne scasb`) through invalid memory after $buf.

          It's possible I could have gotten this wrong, but I did try to limit the number of bytes it would scan to the bytes that it had actually read, by doing

              mov (bufp), %ecx
          
          before the repne scasb. Did I screw that up?

          > HTTP/0.9 ...verbs other than GET.

          Yes, those are unimplemented features, and you're right that their lack makes the server behave incorrectly; hopefully they don't result in security bugs. I think they don't matter in practice, since nobody sends HTTP/0.9 requests or HEAD requests, except by hand, do they?

          > We don't validate that paths start with /, we just skip that byte.

          Right. And the $'/ check below is to keep you from saying

              GET //etc/passwd HTTP/1.0
          
          and getting /etc/passwd. In case that matters in 2013.

          Thank you very much for looking over it!

          • bebna 12 years ago

            Didn't send ab HEAD requests?

            I know it does this by a given flag, but in some tests I have seen some HEADs between my GETs. I haven't used ab for long time, so don't quote me on that. Have u tried httpress[1] as a benchmark tool?

            How about a simple check against the first byte equals G (DEC 71) if it is a GET? Shouldn't be that expensive, I think.

            Thanks for creating it.

            [1] https://bitbucket.org/yarosla/httpress/wiki/Home

            • kragen 12 years ago

              I don't know if ab sends HEAD requests! Thanks for the link to httpress; I've been having trouble with ab failing at high concurrencies (1000 concurrent connections) and also being the bottleneck.

          • anonymouscowar1 12 years ago

            > before the repne scasb. Did I screw that up?

            Ah, it's possible repne scasb halts when ecx drops to zero (that would explain some of the string length asm code I found when I googled it). I'm not very familiar with x86 mneumonics apart from the basics ('mov').

          • anonymouscowar1 12 years ago

            > four million packets per second (well, one million until I parallelize), which is about 85 megabytes per second

            Why is this more than 4 million bytes per second (4 MB/s)? A packet can contain a single byte.

            • kragen 12 years ago

              To be a valid TCP packet, it needs to contain at a minimum a 20-byte IP header and a 20-byte TCP header, plus the one byte of payload. In practice your server is probably receiving the packet over Ethernet, so it probably has an Ethernet header and things like that, too, but that's a minimum. You could approach it over, say, SLIP.

  • knappador 12 years ago

    This is normally the kind of question I ask about anything involving HTML/CSS only or JS only =D PoC's based on low-level concepts are the ones that make you curious about everything from top to bottom. Even though assembly is the least abstract and most esoteric of programming (some would argue opposite) spaces, the program actually reveals itself quite quickly knowing just a few tid-bits. This is how you get to see that even the most low-level aspects of programming are quite accessible.

  • kragen 12 years ago

    Nginx will almost certainly be faster, and is somewhat robust against DoS attacks. I didn't write this to provide benefits. There are situations where this would work better than nginx (where, say, you don't want to spend any time configuring anything) but there are better existing solutions for those cases.

  • hcarvalhoalves 12 years ago

    This is obviously a toy, or PoC.

    • ars_technician 12 years ago

      I don't know about calling it a piece of crap. Seems a little harsh. :-)

      • optymizer 12 years ago

        I don't normally see 'piece of crap' written as PoC. I'm used to seeing PoS used often. I usually see PoC as 'proof of concept'.

      • jamestanderson 12 years ago

        It could mean proof of concept.

      • tinco 12 years ago

        Haha, imagine what tech news would be like if every time you saw PoC you thought it meant piece of crap. Quite hilarious actually.

      • UNIXgod 12 years ago

        Definitely meant it's ROFL web scale, asynchronous, non-blocking, event driven, message passing, nosql, sharded, clusters of highly available, reliable, high performance, real time, bad ass, rockstar, get the girls, get the funding, get the IPO, impress your mom, impress your cat ... applications.

pmiller2 12 years ago

Neat little piece of performance art (pun intended).

radikalus 12 years ago

No full tcp stack in assembly? =p

(Yes there's no point as it's better in hardware blah blah)

Vektorweg 12 years ago

I'm really happy that executable size doesn't matter for server software. Because Yesod produce really big execs.

mikkom 12 years ago

> Depends on the C libraries.

^ That tells everything you need to know.

pekk 12 years ago

and I just got finished rewriting all my large webapps in some obscure Java framework for performance, because of some benchmarks I saw on HN. Guess now I have to rewrite it all in assembly, because more performance is always better right?

  • yeukhon 12 years ago

    Well, the JVM might be doing something smarter than your assembler-from-scratch code.

  • anonymouscowar1 12 years ago

    This is not a very fast webserver. Anything using sendfile() and threads/processes will beat it handily.

    • kragen 12 years ago

      I haven't measured, but I'm pretty sure you're right.

      • anonymouscowar1 12 years ago

        Me too. You could probably find a single-threaded, small file benchmark where they compare similarly (or this even compares better — it does almost nothing). But this is not most benchmarks. Large files or multiple clients will bench this server poorly compared to MT + sendfile(2).

        This server is single threaded and artificially serializes requests, at a minimum. The copy through userspace is going to hurt compared to sendfile for larger files.

        • kragen 12 years ago

          I made it fork. Now, on my netbook, it's able to handle in the neighborhood of a thousand requests per second and 20 megabytes per second, with up to 2048 concurrent connections. Not, I think, spectacular performance, but acceptable for many purposes. You can still DoS it by opening 2048 concurrent connections to it; as long as they are open, it will open no new connections, and it has no timeout.

          This has bloated the executable up to 2088 bytes.

meshko 12 years ago

OMG all these macros. It looks more like Python then Assembly. Come on, real men do not use macros.

  • UNIXgod 12 years ago

    ahh yes, real programmers twiddle bits on their hard drive with a nothing more than a tiny magnet:

    http://www.cs.utah.edu/~elb/folklore/mel.html

  • kragen 12 years ago

    Thank you very much! This is the nicest comment in this entire thread!

  • asmman1 12 years ago

    You're right. Real men don't use Assembly, too, but do use binary instead of. :)

  • derleth 12 years ago

    > Come on, real men do not use macros.

    The sexism and historical ignorance in this sentence are in a race to see which can be more breathtaking.

    Regardless of which wins, meshko will look like a complete fool to anyone who knows what they're talking about.

    • meshko 12 years ago

      Would the joking intent of the usage of men-centric idiom be more clear if I append "now get off my lawn" to the comment?

    • meshko 12 years ago

      Wait, are you serious?

      • derleth 12 years ago

        Yes. I am. The fact you weren't makes the sexism all the more odious.

        • meshko 12 years ago

          Can you explain? I am genuinely curious as two the line of your thoughts now.

          • derleth 12 years ago

            > Can you explain? I am genuinely curious as two the line of your thoughts now.

            The worst forms of bias and discrimination are unexamined, because they can fester and influence thought and action without ever being questioned. It's difficult to argue someone out of a position they don't even realize is a position that is up for argument.

    • derleth 12 years ago

      Because downvoters don't win:

      > Come on, real men do not use macros.

      The sexism and historical ignorance in this sentence are in a race to see which can be more breathtaking.

      Regardless of which wins, meshko will look like a complete fool to anyone who knows what they're talking about.

puppetmaster3 12 years ago

Likely does not have any back door. Rumor is GCC opens back door for you know who.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection