gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > What exactly is the compiler doing?

#29364 - sgeos - Fri Nov 19, 2004 2:26 am

This is my C code:
Code:
int x(int a, int b)
{
   return a - b;
}

int main(void)
{
   int a[2] = {12, 77};

   x(a[0], a[1]);
   return 0;
}


This is the ASM output. (gcc -S test.c)
Code:
   .file   "test.c"
   .text
   .align   2
   .global   x
   .type   x,function
x:
   @ args = 0, pretend = 0, frame = 8
   @ frame_needed = 1, uses_anonymous_args = 0
   mov   ip, sp
   stmfd   sp!, {fp, ip, lr, pc}
   sub   fp, ip, #4
   sub   sp, sp, #8
   str   r0, [fp, #-16]
   str   r1, [fp, #-20]
   ldr   r2, [fp, #-16]
   ldr   r3, [fp, #-20]
   rsb   r3, r3, r2
   mov   r0, r3
   ldmea   fp, {fp, sp, pc}
.Lfe1:
   .size   x,.Lfe1-x
   .section   .rodata
   .align   2
.LC0:
   .word   12
   .word   77
   .text
   .align   2
   .global   main
   .type   main,function
main:
   @ args = 0, pretend = 0, frame = 8
   @ frame_needed = 1, uses_anonymous_args = 0
   mov   ip, sp
   stmfd   sp!, {fp, ip, lr, pc}
   sub   fp, ip, #4
   sub   sp, sp, #8
   ldr   r3, .L3
   ldmia   r3, {r0-r1}
   sub   r2, fp, #16
   stmda   r2, {r0-r1}
   ldr   r0, [fp, #-20]
   ldr   r1, [fp, #-16]
   bl   x
   mov   r3, #0
   mov   r0, r3
   ldmea   fp, {fp, sp, pc}
.L4:
   .align   2
.L3:
   .word   .LC0
.Lfe2:
   .size   main,.Lfe2-main
   .ident   "GCC: (GNU) 3.2.2 (DevKit Advance R5 Beta 3)"


It looks like the compiler moves variables/parameters into a physical memory location and then loads them up again. What is up with the whole
mov ip, sp stmfd sp!, {fp, ip, lr, pc} deal? Why are only three values being popped from the stack? Why is 8 being subtracted from the stack?

I think the array initialization is kind of nifty.

-Brendan

#29367 - sajiimori - Fri Nov 19, 2004 2:56 am

I wouldn't bother trying to decipher compiler output unless you're using at least -O2.

#29370 - DekuTree64 - Fri Nov 19, 2004 4:07 am

That is the craziest way to do one subtraction I have ever seen. It took me quite a while to even figure out why that works at all.
Appearently fp is mainly used for debugging, and it's storing your parameters relative to it so the debugger can look them up. Loading them back again is a total waste nomatter how you look at it though.
Also strange is that it would decide to use a reverse subtract instead of a normal one, and then mov to r0 instead of just putting the result of the subtract there. Not to mention that return sequence. ldmea means to decrement and then load, so since fp points to the 'pc' that was pushed onto the stack (4 less than the original sp), it decrements, then loads the lr that was pushed before into pc, then loads the pushed ip (which had just been set to the original sp) into sp, then loads the original fp.

I'm a little curious to see what it will do with debugging turned off and optimizations on.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#29375 - sgeos - Fri Nov 19, 2004 5:45 am

Is there a better way to turn off debugging?
gcc -O2 -g0 -S test.c
Code:
   .file   "test.c"
   .text
   .align   2
   .global   x
   .type   x,function
x:
   @ args = 0, pretend = 0, frame = 0
   @ frame_needed = 0, uses_anonymous_args = 0
   @ link register save eliminated.
   rsb   r0, r1, r0
   @ lr needed for prologue
   mov   pc, lr
.Lfe1:
   .size   x,.Lfe1-x
   .section   .rodata
   .align   2
.LC0:
   .word   12
   .word   77
   .text
   .align   2
   .global   main
   .type   main,function
main:
   @ args = 0, pretend = 0, frame = 0
   @ frame_needed = 1, uses_anonymous_args = 0
   ldr   r2, .L3
   mov   ip, sp
   stmfd   sp!, {r4, fp, ip, lr, pc}
   ldmia   r2, {r3-r4}
   sub   fp, ip, #4
   mov   r1, r4
   mov   r0, r3
   bl   x
   mov   r0, #0
   ldmea   fp, {r4, fp, sp, pc}
.L4:
   .align   2
.L3:
   .word   .LC0
.Lfe2:
   .size   main,.Lfe2-main
   .ident   "GCC: (GNU) 3.2.2 (DevKit Advance R5 Beta 3)"


-Brendan

#29377 - allenu - Fri Nov 19, 2004 5:59 am

Basically, what's happening is that in main, the code is creating local variables via the stack. It allocates room for it and then loads up the constant values (12 and 77) from another location in memory. Then, in anticipation for the function call to x(), it has to load the parameters into the r0 and r1 registers, which I assume are the registers used for passing the first two parameters.

Inside the function x(), it has to copy the contents from r0 and r1 to local variables 'a' and 'b', which are on the stack. Now the totally redundant step: it copies them to registers r2 and r3. It does the subtraction using these registers and copies the result to r0, which is used as the 'return value' register.

I'm not familiar with ARM assembly, but it looks to me that stmfd and ldmea are for storing and restoring multiple registers on the stack.

#29378 - allenu - Fri Nov 19, 2004 6:01 am

sgeos wrote:
Is there a better way to turn off debugging?
gcc -O2 -g0 -S test.c

Wow, the optimizer did a good job here. It got rid of the unnecessary local variables. As far as I can tell, there is no debugging code in this or the original assembly output.

#29406 - sajiimori - Fri Nov 19, 2004 6:59 pm

If you want to see the compiler do a really good job, then declare 'x' static, compile with -O3, -fomit-frame-pointer, and rename 'main' to something else so it won't generate startup code. If you're using GCC 3.x, the results are almost humorous.

#29408 - isildur - Fri Nov 19, 2004 7:55 pm

Does anyone use -O3 for real code? I always heard that it's something not to do because it over-optimizes and can create bugs.

#29410 - sajiimori - Fri Nov 19, 2004 8:12 pm

Any differences in behavior are the result of compiler bugs. GCC has gotten more solid over the years, and we use -O3 in our studio.

#29419 - sgeos - Fri Nov 19, 2004 8:58 pm

sajiimori wrote:
If you want to see the compiler do a really good job, then declare 'x' static, compile with -O3, -fomit-frame-pointer, and rename 'main' to something else so it won't generate startup code. If you're using GCC 3.x, the results are almost humorous.
lol. I'll post the results for others.

gcc -fomit-frame-pointer -O3 -g0 -S test2.c
Code:
   .file   "test2.c"
   .section   .rodata
   .align   2
.LC0:
   .word   12
   .word   77
   .text
   .align   2
   .global   maine
   .type   maine,function
maine:
   @ args = 0, pretend = 0, frame = 0
   @ frame_needed = 0, uses_anonymous_args = 0
   @ link register save eliminated.
   @ lr needed for prologue
   mov   r0, #0
   mov   pc, lr
.Lfe1:
   .size   maine,.Lfe1-maine
   .ident   "GCC: (GNU) 3.2.2 (DevKit Advance R5 Beta 3)"

Compiling with these options does, of course, defeat the purpose of the exercise. I'm trying to figure what the compiler is doing, and why it is working. In what order are registers pushed onto and popped off of the stack?

-Brendan

#29421 - allenu - Fri Nov 19, 2004 9:13 pm

Ha, nice. It's basically removed the unnecessary call to x() as the results aren't used by anything. I've seen this happen on some of our stuff at work. Kind of makes it tough to debug when code you're expecting to execute doesn't execute. :-) Hence, do not use the optimizations until your code is solid.

#29425 - sajiimori - Fri Nov 19, 2004 10:04 pm

Yeah, intentionally debugging an optimized build is pure insanity. :)

Occasionally it has to be done to work around a compiler bug or to fix a bug in your own code that was only revealed by optimizations, and then you're stuck debugging in assembler. >_<

#29428 - tepples - Fri Nov 19, 2004 11:08 pm

isildur wrote:
Does anyone use -O3 for real code?

TOD is compiled with -O3. The bugs it introduces in practice mostly relate to not declaring the right things as volatile.

allenu wrote:
It's basically removed the unnecessary call to x()

Not removed but inlined. Total removal of a function call should happen only if GCC thinks x() is a pure function, one with no side effects. Sometimes GCC can get confused about pure functions if you forget a volatile somewhere.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#29431 - sajiimori - Fri Nov 19, 2004 11:25 pm

x was purely functional. That's why there's no sign of it at all. The definition wasn't even generated because it was delcared static, so the compiler knows it couldn't be called elsewhere.