Main is usually a function – when is it not?
jroweboy.github.ioThe reason that I use C many of times is not because of its speed but because it (almost) completely reveals it inner working even when there is clear reason not to do so.
(1) Though it is bad for most of time but still I feel more contended sometimes with it, especially for complex code.
(2) Even if this transparency is at superficial level with optimising compilers, but there are very few high level optimisation applied to C code and that too in the form of loop unrolling, tail call optimisation, loop interchange etc.
For this type of stuff, I rather use a Macro Assembler.
For this type of stuff, I use... NOTHING. NOTHING AT ALL. I DON'T DO IT. BECAUSE IT IS INSANE.
Gloriously, wonderfully insane, though. My compliments to the chef.
Reminds me of this 'Hello World': http://stackoverflow.com/questions/15182496/why-does-this-co...
I bet these two ideas would go well together.
Sadly the libc prng is not specified, and thus non-portable. It might work if things are checked on the same kernel and libc combination, or it might not. The Java PRNG is specified and globally portable.
The seed for the random number generator is not executed though.
I was thinking more along the lines of making the main function (array) call some other function which does this in such a way that it looks like you implemented a random number generator but it actually prints "Hello World". And you could hide the fact that 'main' was a piece of code by making it the initial seed for for the random number generator and using macros in a separate header file to rename it to something like 'SEED'.
Next challenge... make it a real string!
https://0xcd80.wordpress.com/2011/04/29/linux-x86-shellcodin...
const char main[] = "UH\x89\xe5\xb8\1\0\0\0\xbb\1\0\0\0g\x8d" "5\x10\0\0\0\xba\r\0\0\0\xf\5\xb8\x3c\0\0\0" "1\xdb\xf\5Hello World!\n]\xc3";I should have been more clear -- something more alphanumeric than ASCII.
Tangentially related: http://en.wikipedia.org/wiki/EICAR_test_file
Not sure the title's question was answered. At the end he just said "And all this time we’ve been ignoring the warning message about main not being a function :)"
I remember 20 years or ago writing my first C/C++ function where I popped out to inline ASM. I was like, wow! I'm in complete control of this box! (It was mostly copied code. My ASM-foo is weak) I really liked the fact that I could either code in these really abstract types -- or be concerned with what the stack looks like on entering a function. So I _think_ I know the answer to the question.
Instead of ignoring the error, I was thinking that you could just mangle the link file and make the entry point whatever you want, right? I could have a program that consisted of "void x", then continue along the lines of the articles with the number list.
Heck, once you start pulling the threads apart like this, the entire structure falls apart. Your source code could look like anything you want it to look like. And that's cool.
I believe you could get around the problem of finding the address of the string by pushing 4 (8?)-byte pieces of it onto the stack and then doing "mov esi, esp".
On the topic of "executable ASCII", the EICAR test file is an interesting example: http://en.wikipedia.org/wiki/EICAR_test_file
There is another known trick :
Place your string after a "call" instruction, then when you are inside the call, the address of the string is on the stack, you can pop it in any register you want.
I like this little project of how one person delved into the fun.
But as far as the TA goes, most people who have an interest in C and programming will look at the source and say to themselves "Oh, byte values of machine code being executed, yes very clever but do the assignment again properly."
If that is the case then the education system is thoroughly broken. It is obvious that a person who took the time to do that learned far more than they would than had they copied one of millions of Hello World examples from the internet.
That being said, I wouldn't rule out a bored TA opening the file not seeing a printf and assigning a grade based on submitting something.
Not at all; TAs are paid poorly, have other more interesting work (i.e., their research) to do, and spending the time to deal with 'clever' is not a good usage of time. Sure the student learnt more, but that doesn't mean this is what they should have handed in. Have fun on your own time, but don't waste the TAs time.
We might have to agree to disagree. If TAs aren't interested in students diving deeper into the course material then I'm not sure they should be paid at all.
But perhaps the less radical approach would be to pay them enough such that they can take the time to judge the work on it's merit.
>But perhaps the less radical approach would be to pay them enough such that they can take the time to judge the work on it's merit.
I assume you have not TAed: I'd like to agree with you, but the realities are it is poor pay for crappy work. Most students don't care about their work. Most students seem to be unable to follow basic instructions. I go out of my way to help students if they care and stay late at labs for these students, but I have no patience for wasting my time because someone wants to show off and deliberately make my life harder. As is, when I'm being paid for 6hrs of lab + marking/week, and I am already spending 8+hrs a week because students can't format their assignments properly, don't write in complete sentences, etc, I have zero interest in patting this guy on the back. Instead, I have to do more work for no pay so he can feel clever.
I'm fairly sure that executing arrays is not part of the course material.
I wonder if one could write a header file containing definitions that would make the following possible.
char main[] = {
movl(1, eax),
movl(1, ebx),
movl(message, esi),
movl(13, edx),
syscall(),
movl(60, eax),
xorl(ebx, ebx),
syscall(),
};
Obviously there are some technical difficulties like handling literal values and code sections, but it could be a fun hack, and I've love to see what someone could come up with.With enough macros you can do anything. I'm not aware of anyone doing this in C, but this is a cool example of doing approximately the same thing in a more easily extensible language: http://wall.org/~lewis/2013/10/15/asm-monad.html
Being able to compile assembly directly into a C program is a very useful tool for allowing optimizations across embedded assembly. One way around it is use of intrinsics: http://en.wikipedia.org/wiki/Intrinsic_function which is basically what we're talking about.
It's possible, but once you're not obfuscating the code - and already depending on OS details when you assume that global data can be executable, whether const or not - I think you may as well just use the inline assembler feature of your compiler. Well, if there is one... MSVC bizarrely doesn't support on x86-64 what was a perfectly good feature on x86.
It might be an easier way to produce obfuscated code by reading the preprocessed output. But I see your point---it's reinventing the wheel in a different way. That's the fun of it of course.
It's not bizarre: inline asm hurts performance and requires more effort from compiler authors than is worth it in our modern age of intrinsics.
I admit I'm not too familiar with the old MSVC inline assembly system, but the way GCC/Clang do it certainly allows emitting identical code to what you can get with intrinsics, although using naive constraints might hurt you. However, the main reason I personally use inline asm is not for performance, but to access instructions which are not provided as intrinsics or require special register handling. For example, I recently wrote some code that did syscalls directly (because it was patching memory so the normal syscall functions might not be accessible); I could have linked a separate .S file, but inline assembly made the output look much nicer. Or, while I haven't written this myself, it can be used to write out nops which zero-cost tracing facilities will then patch, something which separate assembly files cannot do.
As for effort from compiler authors, I'd like to hear more about why it is supposedly so hard. (MSVC is also the compiler which has to prioritize their choice of C++11-17 features to implement over the years, while the competitors have complete C++11 and 14 implementations. I guess that's somewhat ad hominem on the team, but ...)
Theoretically they lead to identical code, but I believe a block of inline ASM results in constraints on surrounding code and prevents some types of compiler optimisation.
See also http://www.ioccc.org/1984/mullender.c (30 years ago). Hint at http://www.ioccc.org/1984/mullender.hint
Not having access to a PDP-11 (maybe this works on an emulator?) is it cheating to ask what the output of this program is?
According to http://www.ioccc.org/faq.html, it is equivalent to
So, it moves a smiley on the screen, at increasing speed.main(){ int i=512; do write(1," :-)\b\b\b\b",9),usleep(i); while(--i); }
I'm impressed with the authors efforts, and share much of his pain when it comes to doing weird things with C.
For other obfuscated fun, see this poem/tree printer in 505 bytes[0], or this program:
main(G){10<putchar(11^--G?89-(0x1F&2846984999544829581>>5*G):10)&&main(2+G);}
[0] - https://github.com/lelandbatey/tiny_tree_printerOK, your one-liner is simple enough that maybe it's worth a few minutes to unpack it here.
Let's begin with that big constant. The values of
for successive choices of G are: 76 69 76 65 78 68 66 65 84 69 89 74 87. "Obviously" these are character indices; the characters are L,E,L,A,N,D,B,A,T,E,Y,J,W. (I remark that Leland was the first name of the founder of Stanford University and conjecture that the author of the code works at, or studied at, Stanford. And then I look up at the username of the person who posted it and feel a bit stupid.)89-(0x1F&2846984999544829581>>5*G)Now, what's the argument to putchar? It has the form
so it decrements G, and then if the result is anything other than 11 it evaluates thing1; but if the result is 11 it evaluates thing2 instead. In this case, thing1 is our LELANDBATEY generator and thing2 is 10 (= newline).11^--G ? thing1 : thing2So far, so good. One more step out. What comes before the && inside main? It looks like this:
and the return value from putchar is (barring I/O failures) the same as what's fed into it. So, putting the above together, when G isn't 12 we'll decrement G, write out a character from LELANDBATEYJW (indexed by G), and this thing will be true; when G is 12 we'll decrement G, write out a newline, and this thing will be false.10<putchar(stuff)Nearly there. Now the whole thing looks like
so when G isn't 12 it will decrement G, write out a character from LELANDBATEY, and immediately call main with G+2 (which is 1 more than G started out as); but if G is 12 on entry to main it will decrement G, write out a newline, and then return (perhaps to a previous stack frame which will also return, and so on until exit).main(G) { thing_above && main(2+G); }OK. So, I guess this program is intended to be invoked with no command-line args. Then G (= argc) will be 1. We'll write out characters 0, 1, 2, ..., 10 of the LELANDBATEY string (getting exactly that far) at which point G will be 12 on entry to the next main call. Newline, exit. Done.
(In case it isn't clear: it outputs "LELANDBATEY\n".)
No offence, I hope, but this is really tame compared with typical short-short IOCCC entries. Here's an example from the 22nd IOCCC. It's short enough to tweet.
See http://www.ioccc.org/2013/endoh3/hint.html for more about this; it interprets tunes written in a simple musical notation and generates raw audio data.char a;float b,c;main(d){for(;d>2e3*c?c=1,scanf(" %c%f",&a,&c),d=55-a%32*9/5,b=d>9,d=d%13-a/32*12:1;a=2)++d<24?b*=89/84.:putchar(a=b*d);}The one-liner I posted is really meant to be just understandable enough to someone who's taken a solid first year computer science course with C and is looking for more. At least that's where I was approximately when I first found it. So yes, it's mean to be tame, more of a brainteaser than indecipherable.
Also, I wrote a program to generate these. So here's one for you:
In fact, I originally found this style of tiny C program on Hacker News nearly two years ago. Here's the link to the original: https://news.ycombinator.com/item?id=5524428main(O){10<putchar(5^--O?77-(31&2278424678>>5*O):10)&&main(2+O);}
I remember something similar that enabled you to embed machine code in VB6 programs, which involved a magic constant that was sort of an entry point into a string containing opcodes. I don't remember the specifics, it's been a while, but nowadays it's very likely a horrible no-go anyway because you're executing data.
Still, was a nifty trick that allowed you to get more performance out of a language that wasn't very fast to begin with.
I remember QuickBasic (the precursor of VB: VB 1.0 was effectively QB 8.0) had a CALL ABSOLUTE statement, which let you execute machine code in a string. I used it to great effect in graphics demos, once upon a time.
Wait, what kind of company forces one of their programmers to take an intro level comp sci course? If you're going to let people hire someone to do a job, maybe you should either trust their judgement or have someone senior work with them to fix obvious defects?
I don't know, something about that just didn't sit right with me.
That's almost certainly not what happened here. He was probably doing a Masters or something in a related field which had this requirement.
On a possibly comparable note: in C++, programs can be run entirely from the constructors of static global objects ... letting one write a program with
#include "X.h"
X x;
void main(){}This is very cool. For the next assignment, work in a Duff's Device.
I recall from somewhere:
int main = 0;
This code compiles (cleanly!), but segfaults.I think that sets the address of main to NULL, so it segfaults as soon as `_start` jumps to `main`. It's known for being one of the smallest compilable C programs.
EDIT: Your original post contained
int main=0;Not quite - what matters (normally) is the address of the symbol, not the bytes located there, since in the case of a real function those bytes would be the instructions. So this will either execute the bytes at &main as instructions (4 zero bytes, and whatever follows), or, more likely, crash due to memory protections, as described in the article.
Yup. You can get a working program with the simple:
...which is just a return. Or you can get fancy and make it exit successfully by clearing eax first:const int main = 0xC3;const int main = 0xC3C031;
In pre-C99 C, that can be shortened to the equivalent:
main=0;
> My problem solving process is typically the same thing I imagine most programmers do.
I'm afraid I may be a victim of Poe's law here, but this attitude is... distressing. It completely discounts the value of problem solving, which, once upon a time, was the entire point of the profession.
It does not. It just moves what "the problem" is.
An example of that is the article itself which leverages the gained information from SO to solve the real problem.
Also arcane knowledge is not the same thing as problem solving.
> If not solved, try a different query and repeat.
I'm not sure if you even bothered to read the article... since that was the very next sentence. I, too, wanted to give the author the benefit of the doubt and assume it was mere information gathering, which is a good thing, but the next sentence shows that the intent is to solve the problem from start to finish with Google.
"Problem solving" is perhaps the wrong phrase here. It's more like "information gathering", which is an important step in problem solving.
In python it is still a function but it is also an object?
Well more properly python doesn't have a "main" function. You just directly execute the initial script, which will import other modules, define functions and classes, etc, and finally (hopefully) execute something.
In C# main() is a method attached to the main Program class.
Wouldn't this be considered academic fraud? His coworker turning in code that he didn't write?
Probably not, since this isn't a case where my co worker can just copy paste my code and hope it works for two reasons. The first is I am not entirely sure their testing environment is using 64 bit linux, it could be 32 bit linux since I'm not in the class, I just assumed its 64bit so I could test it on my machine. The next reason is they don't actually have a Hello World assignment at all! The reason I wrote this article is so I could share something I learned, and my co worker (who has already completed the first half of the semester's assignments) could use this as a starting point for how to write his own version for an assignment.
> The first is I am not entirely sure their testing environment is using 64 bit linux, it could be 32 bit linux
Maybe it's ARM, or MIPS, or… Followup problem: write a sequence of bytes that performs a jump on one architecture, and something harmless or reversible on the other(s).
Not working on Mac, but I'm gonna tweet it anyway :)
I spent a bit getting it to work on OS X. The ASM becomes:
movq $0x2000004, %rax ; // BSD syscalls are divided into classes. movq $1, %rdi ; // 64-bit registers use %rdi instead of %ebx lea message(%rip), %rsi; // Relative Address of message movq $13, %rdx ; // Same Length syscall ; // x86-64 ASM syscall movq $0x2000001, %rax ; // Exit Syscall movq $0, %rdi; syscall message: .ascii "Hello World!\n";Nice. Thx.
But it is too long to tweetecho 'const int main[]={3850979413,184,3234285568,33554436,29869896,1207959552,1652109,3343384576,3522,1208291072,114887,3343385088,199,1208291072,1869376613,1919899424,169960556};'>t.c&&gcc t.c&&./a.out[Edit]
Less compact but more readable:
echo 'const int main[] = { 3850979413, 184, 3234285568, 33554436, 29869896, 1207959552, 1652109, 3343384576, 3522, 1208291072, 114887, 3343385088, 199, 1208291072, 1869376613, 1919899424, 169960556};' > test.c && gcc -Wall test.c -o test && ./testOk, now also the Mac version is twittable:
The trick is to use retq instead of a sys call to exit and to strip some of the code that the compiler create when you call a function (pushq %rpb; moveq %rsp, %rbp; movl $0x0, eax;)echo 'const int main[]={1208,114434,2370306048,2101,834048,84869120,1818577091,1461743468,1684828783,10};'|cc -xc -ot -&&./t