This is part II of my irregularly scheduled series on compiler optimization. In part I, I explained how the compiler can optimize away return statements, resulting in missed breakpoints. My given workaround to that problem, though effective, was very ugly and architecture-dependent, much like Cowboys Stadium.
(gdb) break *0x00001fc5 if $eax != 0
Although there’s not much we can do to prevent the compiler optimization, we can greatly simplify our conditional breakpoint. I had suggested rewriting the source code, which was awe-inspiringly prescient, because that’s what I’m going to do now. Here’s the original code:
8 if (ShouldReturn())
9 return;
And here’s the revised code:
8 int localVar = ShouldReturn();
9 if (localVar)
10 return;
The return at line 10 will still be optimized away. However, the revised code allows us to set a simple breakpoint at line 9 that will stop when we want:
(gdb) break 9 if localVar != 0
No knowledge of the architecture, machine registers, or assembly language is required.
From the beginning of time (January 1970, of course), programmers have struggled over coding style. Objective-C programmers, for example, expend undue effort arranging their brackets. (I have [NSMutableArray array] going to the Final Four.) For some, bracket-making becomes a kind of game or contest.
[[[[[[[[[[[[[See how] many] method] calls] we] can] fit] on] one] line] of] source] code];
I’ve changed my coding style over the years, but I’ve settled on one fundamental principle: write your code so that it’s easy to debug. All your fancy margin-aligning isn’t going to help when you need to figure out why your app keeps exploding. If you have nested method calls on one line of code, you can’t easily set a breakpoint in the middle. That’s why I prefer as much as possible to have only one method call per line of code, and create a local variable to store the return value.
There is a misconception that local variables are expensive, in terms of either computation or memory. The truth is that local variables are very cheap, the value meals of the computing world. (Would you like trans fat with your saturated fat?) It only takes one machine instruction to store a pointer address to a local variable. One machine instruction is really quite fast, about as fast as you can get — at least with restrictor plates. With regard to memory, local variables only take up stack space. To create a local variable, you simply move the stack a little. When the method or function returns, the stack is moved back, and thereby the space reserved for local variables is automatically recovered. Of course, you don’t want to create large C arrays on the stack, but a pointer to an Objective-C object only takes 4 bytes on the stack for 32-bit, 8 bytes for 64-bit. The default 32-bit stack size is 8MB, so you’re not going to run out of space unless you have deeply recursive calls.
Even these small costs are only relevant in the context of your app’s unoptimized, debug configuration. For your customers, on the other hand, local variables are free. As in Mumia, or Bird. When you compile your app using the release configuration, the local variables disappear, the compiler optimizes them away. (By the way, this is one of the reasons that debugging the release build of your app can be a frustrating and/or wacky experience.) To see the optimization in action, let’s consider some sample code:
1 #import <Foundation/Foundation.h>
2
3 @interface MyObject : NSObject {}
4 @end
5
6 @implementation MyObject
7
8 -(NSString *)myDirectProcessName {
9 return [[[NSProcessInfo processInfo] processName] lowercaseString];
10 }
11
12 -(NSString *)myRoundaboutProcessName {
13 NSString *myRoundaboutProcessName = nil;
14 NSProcessInfo *processInfo = [NSProcessInfo processInfo];
15 NSString *processName = [processInfo processName];
16 NSString *lowercaseString = [processName lowercaseString];
17 myRoundaboutProcessName = lowercaseString;
18 return myRoundaboutProcessName;
19 }
20
21 @end
22
23 int main(int argc, const char *argv[]) {
24 NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
25 MyObject *myObject = [[[MyObject alloc] init] autorelease];
26 NSLog(@"My direct process name: %@", [myObject myDirectProcessName]);
27 NSLog(@"My roundabout process name: %@", [myObject myRoundaboutProcessName]);
28 [pool release];
29 return 0;
30 }
The above code is obviously contrived and useless. It only has value for explanatory purposes, and perhaps in the app store for $0.99. The methods -myRoundaboutProcessName and -myDirectProcessName do the same thing, the former with and the latter without local variables. Here’s the i386 disassembly for the methods when compiled using the debug configuration:
-[MyObject myDirectProcessName]:
00001d2a nop
00001d2b nop
00001d2c nop
00001d2d nop
00001d2e nop
00001d2f nop
00001d30 pushl %ebp
00001d31 movl %esp,%ebp
00001d33 pushl %ebx
00001d34 subl $0x14,%esp
00001d37 calll 0x00001d3c
00001d3c popl %ebx
00001d3d leal 0x000012e8(%ebx),%eax
00001d43 movl (%eax),%eax
00001d45 movl %eax,%edx
00001d47 leal 0x000012e4(%ebx),%eax
00001d4d movl (%eax),%eax
00001d4f movl %eax,0x04(%esp)
00001d53 movl %edx,(%esp)
00001d56 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001d5b movl %eax,%edx
00001d5d leal 0x000012e0(%ebx),%eax
00001d63 movl (%eax),%eax
00001d65 movl %eax,0x04(%esp)
00001d69 movl %edx,(%esp)
00001d6c calll 0x0000400a ; symbol stub for: _objc_msgSend
00001d71 movl %eax,%edx
00001d73 leal 0x000012dc(%ebx),%eax
00001d79 movl (%eax),%eax
00001d7b movl %eax,0x04(%esp)
00001d7f movl %edx,(%esp)
00001d82 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001d87 addl $0x14,%esp
00001d8a popl %ebx
00001d8b leave
00001d8c ret
-[MyObject myRoundaboutProcessName]:
00001d8d nop
00001d8e nop
00001d8f nop
00001d90 nop
00001d91 nop
00001d92 nop
00001d93 pushl %ebp
00001d94 movl %esp,%ebp
00001d96 pushl %ebx
00001d97 subl $0x24,%esp
00001d9a calll 0x00001d9f
00001d9f popl %ebx
00001da0 movl $0x00000000,0xe8(%ebp)
00001da7 leal 0x00001285(%ebx),%eax
00001dad movl (%eax),%eax
00001daf movl %eax,%edx
00001db1 leal 0x00001281(%ebx),%eax
00001db7 movl (%eax),%eax
00001db9 movl %eax,0x04(%esp)
00001dbd movl %edx,(%esp)
00001dc0 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001dc5 movl %eax,0xec(%ebp)
00001dc8 movl 0xec(%ebp),%edx
00001dcb leal 0x0000127d(%ebx),%eax
00001dd1 movl (%eax),%eax
00001dd3 movl %eax,0x04(%esp)
00001dd7 movl %edx,(%esp)
00001dda calll 0x0000400a ; symbol stub for: _objc_msgSend
00001ddf movl %eax,0xf0(%ebp)
00001de2 movl 0xf0(%ebp),%edx
00001de5 leal 0x00001279(%ebx),%eax
00001deb movl (%eax),%eax
00001ded movl %eax,0x04(%esp)
00001df1 movl %edx,(%esp)
00001df4 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001df9 movl %eax,0xf4(%ebp)
00001dfc movl 0xf4(%ebp),%eax
00001dff movl %eax,0xe8(%ebp)
00001e02 movl 0xe8(%ebp),%eax
00001e05 addl $0x24,%esp
00001e08 popl %ebx
00001e09 leave
00001e0a ret
As expected, -myRoundaboutProcessName makes more room on the stack than -myDirectProcessName:
00001d34 subl $0x14,%esp
00001d97 subl $0x24,%esp
At 00001da0, -myRoundaboutProcessName sets the value of the local variable to nil, as in line 13 of the source code. The interesting differences, though, are immediately after the calls to objc_msgSend(). By the standard ABI, the register eax contains the return value of objc_msgSend(). In -myDirectProcessName, the value in eax is simply moved to the register edx:
00001d5b movl %eax,%edx
In contrast, -myRoundaboutProcessName first stores the value on the stack before moving it to edx. The address on the stack is the space reserved for the local variable:
00001dc5 movl %eax,0xec(%ebp)
00001dc8 movl 0xec(%ebp),%edx
After the final objc_msgSend() call, -myDirectProcessName doesn’t bother to do much, because the return value in eax will become the return value of the whole method. In -myRoundaboutProcessName, it needs to store values in local variables as in lines 16 and 17 of the source code:
00001df9 movl %eax,0xf4(%ebp)
00001dfc movl 0xf4(%ebp),%eax
00001dff movl %eax,0xe8(%ebp)
00001e02 movl 0xe8(%ebp),%eax
So that’s how the methods differ in the unoptimized build. Now let’s see what happens when we use the release configuration. Here’s the optimized disassembly for -myDirectProcessName:
-[MyObject myDirectProcessName]:
00001dce pushl %ebp
00001dcf movl %esp,%ebp
00001dd1 subl $0x18,%esp
00001dd4 movl 0x00003000,%eax
00001dd9 movl %eax,0x04(%esp)
00001ddd movl 0x0000302c,%eax
00001de2 movl %eax,(%esp)
00001de5 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001dea movl 0x00003004,%edx
00001df0 movl %edx,0x04(%esp)
00001df4 movl %eax,(%esp)
00001df7 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001dfc movl 0x00003008,%edx
00001e02 movl %edx,0x0c(%ebp)
00001e05 movl %eax,0x08(%ebp)
00001e08 leave
00001e09 jmpl 0x0000400a ; symbol stub for: _objc_msgSend
The optimized method is significantly shorter, as expected from the compiler option -Os. First, you’ll notice that all those pesky nop instructions have been deleted. Stallman put them in unoptimized builds just to annoy us. (Or they may have been for Fix and Continue, but I always assume the worst.) There are additional optimizations as well that I won’t belabor here, because I’m eager to get to the climax. (Sorry, dear.) For your enlightenment and enjoyment, here’s the optimized disassembly for -myRoundaboutProcessName:
-[MyObject myRoundaboutProcessName]:
00001e0e pushl %ebp
00001e0f movl %esp,%ebp
00001e11 subl $0x18,%esp
00001e14 movl 0x00003000,%eax
00001e19 movl %eax,0x04(%esp)
00001e1d movl 0x0000302c,%eax
00001e22 movl %eax,(%esp)
00001e25 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001e2a movl 0x00003004,%edx
00001e30 movl %edx,0x04(%esp)
00001e34 movl %eax,(%esp)
00001e37 calll 0x0000400a ; symbol stub for: _objc_msgSend
00001e3c movl 0x00003008,%edx
00001e42 movl %edx,0x0c(%ebp)
00001e45 movl %eax,0x08(%ebp)
00001e48 leave
00001e49 jmpl 0x0000400a ; symbol stub for: _objc_msgSend
Identical! Ah, that’s nice. Smoke ‘em if you got ‘em.
In conclusion, feel free to sprinkle, pepper, dash, or even drown your code with local variables. And with the engineering hours of debugging time you save, get me a nice (not free) present. I’m partial to flavored coffee and unflavored MacBooks.
This entry was posted on Saturday, December 19th, 2009 at 1:20 pm and is filed under Apple, C, Cocoa, Unix, Xcode. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.