Settings

Theme

Main is usually a function – when is it not?

jroweboy.github.io

338 points by b0b_d0e 11 years ago · 61 comments

Reader

fugyk 11 years ago

The reason that I use C many of times is not because of its speed but because it (almost) completely reveals it inner working even when there is clear reason not to do so.

(1) Though it is bad for most of time but still I feel more contended sometimes with it, especially for complex code.

(2) Even if this transparency is at superficial level with optimising compilers, but there are very few high level optimisation applied to C code and that too in the form of loop unrolling, tail call optimisation, loop interchange etc.

  • pjmlp 11 years ago

    For this type of stuff, I rather use a Macro Assembler.

    • jerf 11 years ago

      For this type of stuff, I use... NOTHING. NOTHING AT ALL. I DON'T DO IT. BECAUSE IT IS INSANE.

      Gloriously, wonderfully insane, though. My compliments to the chef.

ejk314 11 years ago

Reminds me of this 'Hello World': http://stackoverflow.com/questions/15182496/why-does-this-co...

I bet these two ideas would go well together.

  • masklinn 11 years ago

    Sadly the libc prng is not specified, and thus non-portable. It might work if things are checked on the same kernel and libc combination, or it might not. The Java PRNG is specified and globally portable.

  • vortico 11 years ago

    The seed for the random number generator is not executed though.

    • ejk314 11 years ago

      I was thinking more along the lines of making the main function (array) call some other function which does this in such a way that it looks like you implemented a random number generator but it actually prints "Hello World". And you could hide the fact that 'main' was a piece of code by making it the initial seed for for the random number generator and using macros in a separate header file to rename it to something like 'SEED'.

mmastrac 11 years ago

Next challenge... make it a real string!

https://0xcd80.wordpress.com/2011/04/29/linux-x86-shellcodin...

https://code.google.com/p/alpha3/

DanielBMarkham 11 years ago

Not sure the title's question was answered. At the end he just said "And all this time we’ve been ignoring the warning message about main not being a function :)"

I remember 20 years or ago writing my first C/C++ function where I popped out to inline ASM. I was like, wow! I'm in complete control of this box! (It was mostly copied code. My ASM-foo is weak) I really liked the fact that I could either code in these really abstract types -- or be concerned with what the stack looks like on entering a function. So I _think_ I know the answer to the question.

Instead of ignoring the error, I was thinking that you could just mangle the link file and make the entry point whatever you want, right? I could have a program that consisted of "void x", then continue along the lines of the articles with the number list.

Heck, once you start pulling the threads apart like this, the entire structure falls apart. Your source code could look like anything you want it to look like. And that's cool.

userbinator 11 years ago

I believe you could get around the problem of finding the address of the string by pushing 4 (8?)-byte pieces of it onto the stack and then doing "mov esi, esp".

On the topic of "executable ASCII", the EICAR test file is an interesting example: http://en.wikipedia.org/wiki/EICAR_test_file

  • Spl3en 11 years ago

    There is another known trick :

    Place your string after a "call" instruction, then when you are inside the call, the address of the string is on the stack, you can pop it in any register you want.

SixSigma 11 years ago

I like this little project of how one person delved into the fun.

But as far as the TA goes, most people who have an interest in C and programming will look at the source and say to themselves "Oh, byte values of machine code being executed, yes very clever but do the assignment again properly."

  • VBprogrammer 11 years ago

    If that is the case then the education system is thoroughly broken. It is obvious that a person who took the time to do that learned far more than they would than had they copied one of millions of Hello World examples from the internet.

    That being said, I wouldn't rule out a bored TA opening the file not seeing a printf and assigning a grade based on submitting something.

    • _sigma 11 years ago

      Not at all; TAs are paid poorly, have other more interesting work (i.e., their research) to do, and spending the time to deal with 'clever' is not a good usage of time. Sure the student learnt more, but that doesn't mean this is what they should have handed in. Have fun on your own time, but don't waste the TAs time.

      • VBprogrammer 11 years ago

        We might have to agree to disagree. If TAs aren't interested in students diving deeper into the course material then I'm not sure they should be paid at all.

        But perhaps the less radical approach would be to pay them enough such that they can take the time to judge the work on it's merit.

        • _sigma 11 years ago

          >But perhaps the less radical approach would be to pay them enough such that they can take the time to judge the work on it's merit.

          I assume you have not TAed: I'd like to agree with you, but the realities are it is poor pay for crappy work. Most students don't care about their work. Most students seem to be unable to follow basic instructions. I go out of my way to help students if they care and stay late at labs for these students, but I have no patience for wasting my time because someone wants to show off and deliberately make my life harder. As is, when I'm being paid for 6hrs of lab + marking/week, and I am already spending 8+hrs a week because students can't format their assignments properly, don't write in complete sentences, etc, I have zero interest in patting this guy on the back. Instead, I have to do more work for no pay so he can feel clever.

        • SixSigma 11 years ago

          I'm fairly sure that executing arrays is not part of the course material.

vortico 11 years ago

I wonder if one could write a header file containing definitions that would make the following possible.

  char main[] = {
  	movl(1, eax),
  	movl(1, ebx),
  	movl(message, esi),
  	movl(13, edx),
  	syscall(),
  	movl(60, eax),
  	xorl(ebx, ebx),
  	syscall(),
  };
Obviously there are some technical difficulties like handling literal values and code sections, but it could be a fun hack, and I've love to see what someone could come up with.
  • lmm 11 years ago

    With enough macros you can do anything. I'm not aware of anyone doing this in C, but this is a cool example of doing approximately the same thing in a more easily extensible language: http://wall.org/~lewis/2013/10/15/asm-monad.html

  • comex 11 years ago

    It's possible, but once you're not obfuscating the code - and already depending on OS details when you assume that global data can be executable, whether const or not - I think you may as well just use the inline assembler feature of your compiler. Well, if there is one... MSVC bizarrely doesn't support on x86-64 what was a perfectly good feature on x86.

    • vortico 11 years ago

      It might be an easier way to produce obfuscated code by reading the preprocessed output. But I see your point---it's reinventing the wheel in a different way. That's the fun of it of course.

    • TazeTSchnitzel 11 years ago

      It's not bizarre: inline asm hurts performance and requires more effort from compiler authors than is worth it in our modern age of intrinsics.

      • comex 11 years ago

        I admit I'm not too familiar with the old MSVC inline assembly system, but the way GCC/Clang do it certainly allows emitting identical code to what you can get with intrinsics, although using naive constraints might hurt you. However, the main reason I personally use inline asm is not for performance, but to access instructions which are not provided as intrinsics or require special register handling. For example, I recently wrote some code that did syscalls directly (because it was patching memory so the normal syscall functions might not be accessible); I could have linked a separate .S file, but inline assembly made the output look much nicer. Or, while I haven't written this myself, it can be used to write out nops which zero-cost tracing facilities will then patch, something which separate assembly files cannot do.

        As for effort from compiler authors, I'd like to hear more about why it is supposedly so hard. (MSVC is also the compiler which has to prioritize their choice of C++11-17 features to implement over the years, while the competitors have complete C++11 and 14 implementations. I guess that's somewhat ad hominem on the team, but ...)

        • TazeTSchnitzel 11 years ago

          Theoretically they lead to identical code, but I believe a block of inline ASM results in constraints on surrounding code and prevents some types of compiler optimisation.

Someone 11 years ago

See also http://www.ioccc.org/1984/mullender.c (30 years ago). Hint at http://www.ioccc.org/1984/mullender.hint

  • sirclueless 11 years ago

    Not having access to a PDP-11 (maybe this works on an emulator?) is it cheating to ask what the output of this program is?

lelandbatey 11 years ago

I'm impressed with the authors efforts, and share much of his pain when it comes to doing weird things with C.

For other obfuscated fun, see this poem/tree printer in 505 bytes[0], or this program:

    main(G){10<putchar(11^--G?89-(0x1F&2846984999544829581>>5*G):10)&&main(2+G);}
[0] - https://github.com/lelandbatey/tiny_tree_printer
  • gjm11 11 years ago

    OK, your one-liner is simple enough that maybe it's worth a few minutes to unpack it here.

    Let's begin with that big constant. The values of

      89-(0x1F&2846984999544829581>>5*G)
    
    for successive choices of G are: 76 69 76 65 78 68 66 65 84 69 89 74 87. "Obviously" these are character indices; the characters are L,E,L,A,N,D,B,A,T,E,Y,J,W. (I remark that Leland was the first name of the founder of Stanford University and conjecture that the author of the code works at, or studied at, Stanford. And then I look up at the username of the person who posted it and feel a bit stupid.)

    Now, what's the argument to putchar? It has the form

      11^--G ? thing1 : thing2
    
    so it decrements G, and then if the result is anything other than 11 it evaluates thing1; but if the result is 11 it evaluates thing2 instead. In this case, thing1 is our LELANDBATEY generator and thing2 is 10 (= newline).

    So far, so good. One more step out. What comes before the && inside main? It looks like this:

      10<putchar(stuff)
    
    and the return value from putchar is (barring I/O failures) the same as what's fed into it. So, putting the above together, when G isn't 12 we'll decrement G, write out a character from LELANDBATEYJW (indexed by G), and this thing will be true; when G is 12 we'll decrement G, write out a newline, and this thing will be false.

    Nearly there. Now the whole thing looks like

      main(G) { thing_above && main(2+G); }
    
    so when G isn't 12 it will decrement G, write out a character from LELANDBATEY, and immediately call main with G+2 (which is 1 more than G started out as); but if G is 12 on entry to main it will decrement G, write out a newline, and then return (perhaps to a previous stack frame which will also return, and so on until exit).

    OK. So, I guess this program is intended to be invoked with no command-line args. Then G (= argc) will be 1. We'll write out characters 0, 1, 2, ..., 10 of the LELANDBATEY string (getting exactly that far) at which point G will be 12 on entry to the next main call. Newline, exit. Done.

    (In case it isn't clear: it outputs "LELANDBATEY\n".)

    No offence, I hope, but this is really tame compared with typical short-short IOCCC entries. Here's an example from the 22nd IOCCC. It's short enough to tweet.

      char a;float b,c;main(d){for(;d>2e3*c?c=1,scanf(" %c%f",&a,&c),d=55-a%32*9/5,b=d>9,d=d%13-a/32*12:1;a=2)++d<24?b*=89/84.:putchar(a=b*d);}
    
    See http://www.ioccc.org/2013/endoh3/hint.html for more about this; it interprets tunes written in a simple musical notation and generates raw audio data.
    • lelandbatey 11 years ago

      The one-liner I posted is really meant to be just understandable enough to someone who's taken a solid first year computer science course with C and is looking for more. At least that's where I was approximately when I first found it. So yes, it's mean to be tame, more of a brainteaser than indecipherable.

      Also, I wrote a program to generate these. So here's one for you:

          main(O){10<putchar(5^--O?77-(31&2278424678>>5*O):10)&&main(2+O);}
      
      In fact, I originally found this style of tiny C program on Hacker News nearly two years ago. Here's the link to the original: https://news.ycombinator.com/item?id=5524428
ygra 11 years ago

I remember something similar that enabled you to embed machine code in VB6 programs, which involved a magic constant that was sort of an entry point into a string containing opcodes. I don't remember the specifics, it's been a while, but nowadays it's very likely a horrible no-go anyway because you're executing data.

Still, was a nifty trick that allowed you to get more performance out of a language that wasn't very fast to begin with.

  • evincarofautumn 11 years ago

    I remember QuickBasic (the precursor of VB: VB 1.0 was effectively QB 8.0) had a CALL ABSOLUTE statement, which let you execute machine code in a string. I used it to great effect in graphics demos, once upon a time.

wheaties 11 years ago

Wait, what kind of company forces one of their programmers to take an intro level comp sci course? If you're going to let people hire someone to do a job, maybe you should either trust their judgement or have someone senior work with them to fix obvious defects?

I don't know, something about that just didn't sit right with me.

  • Kudos 11 years ago

    That's almost certainly not what happened here. He was probably doing a Masters or something in a related field which had this requirement.

ctdonath 11 years ago

On a possibly comparable note: in C++, programs can be run entirely from the constructors of static global objects ... letting one write a program with

  #include "X.h"
  X x;
  void main(){}
mncolinlee 11 years ago

This is very cool. For the next assignment, work in a Duff's Device.

http://en.wikipedia.org/wiki/Duff%27s_device

derefr 11 years ago

I recall from somewhere:

   int main = 0;
This code compiles (cleanly!), but segfaults.
  • vortico 11 years ago

    I think that sets the address of main to NULL, so it segfaults as soon as `_start` jumps to `main`. It's known for being one of the smallest compilable C programs.

    EDIT: Your original post contained

      int main=0;
    • comex 11 years ago

      Not quite - what matters (normally) is the address of the symbol, not the bytes located there, since in the case of a real function those bytes would be the instructions. So this will either execute the bytes at &main as instructions (4 zero bytes, and whatever follows), or, more likely, crash due to memory protections, as described in the article.

      • zwegner 11 years ago

        Yup. You can get a working program with the simple:

          const int main = 0xC3;
        
        ...which is just a return. Or you can get fancy and make it exit successfully by clearing eax first:

          const int main = 0xC3C031;
    • firethief 11 years ago

      In pre-C99 C, that can be shortened to the equivalent:

         main=0;
belovedeagle 11 years ago

> My problem solving process is typically the same thing I imagine most programmers do.

I'm afraid I may be a victim of Poe's law here, but this attitude is... distressing. It completely discounts the value of problem solving, which, once upon a time, was the entire point of the profession.

  • mden 11 years ago

    It does not. It just moves what "the problem" is.

    An example of that is the article itself which leverages the gained information from SO to solve the real problem.

    Also arcane knowledge is not the same thing as problem solving.

    • belovedeagle 11 years ago

      > If not solved, try a different query and repeat.

      I'm not sure if you even bothered to read the article... since that was the very next sentence. I, too, wanted to give the author the benefit of the doubt and assume it was mere information gathering, which is a good thing, but the next sentence shows that the intent is to solve the problem from start to finish with Google.

  • ekimekim 11 years ago

    "Problem solving" is perhaps the wrong phrase here. It's more like "information gathering", which is an important step in problem solving.

dmritard96 11 years ago

In python it is still a function but it is also an object?

  • ekimekim 11 years ago

    Well more properly python doesn't have a "main" function. You just directly execute the initial script, which will import other modules, define functions and classes, etc, and finally (hopefully) execute something.

  • jarcane 11 years ago

    In C# main() is a method attached to the main Program class.

Buge 11 years ago

Wouldn't this be considered academic fraud? His coworker turning in code that he didn't write?

  • b0b_d0eOP 11 years ago

    Probably not, since this isn't a case where my co worker can just copy paste my code and hope it works for two reasons. The first is I am not entirely sure their testing environment is using 64 bit linux, it could be 32 bit linux since I'm not in the class, I just assumed its 64bit so I could test it on my machine. The next reason is they don't actually have a Hello World assignment at all! The reason I wrote this article is so I could share something I learned, and my co worker (who has already completed the first half of the semester's assignments) could use this as a starting point for how to write his own version for an assignment.

    • kps 11 years ago

      > The first is I am not entirely sure their testing environment is using 64 bit linux, it could be 32 bit linux

      Maybe it's ARM, or MIPS, or… Followup problem: write a sequence of bytes that performs a jump on one architecture, and something harmless or reversible on the other(s).

ecesena 11 years ago

Not working on Mac, but I'm gonna tweet it anyway :)

  • 0x44 11 years ago

    I spent a bit getting it to work on OS X. The ASM becomes:

        movq $0x2000004, %rax  ; // BSD syscalls are divided into classes.
        movq $1, %rdi          ; // 64-bit registers use %rdi instead of %ebx
        lea message(%rip), %rsi; // Relative Address of message
        movq $13, %rdx         ; // Same Length
        syscall                ; // x86-64 ASM syscall
    
        movq $0x2000001, %rax  ; // Exit Syscall
        movq $0, %rdi;
        syscall
    
        message: .ascii "Hello World!\n";
  • LukaAl 11 years ago

      echo 'const int main[]={3850979413,184,3234285568,33554436,29869896,1207959552,1652109,3343384576,3522,1208291072,114887,3343385088,199,1208291072,1869376613,1919899424,169960556};'>t.c&&gcc t.c&&./a.out
    
    But it is too long to tweet

    [Edit]

    Less compact but more readable:

      echo 'const int main[] = { 3850979413, 184, 3234285568, 33554436, 29869896, 1207959552, 1652109, 3343384576, 3522, 1208291072, 114887, 3343385088, 199, 1208291072, 1869376613, 1919899424, 169960556};' > test.c && gcc -Wall test.c -o test && ./test
    • LukaAl 11 years ago

      Ok, now also the Mac version is twittable:

        echo 'const int main[]={1208,114434,2370306048,2101,834048,84869120,1818577091,1461743468,1684828783,10};'|cc -xc -ot -&&./t
      
      The trick is to use retq instead of a sys call to exit and to strip some of the code that the compiler create when you call a function (pushq %rpb; moveq %rsp, %rbp; movl $0x0, eax;)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection