#29364 - sgeos - Fri Nov 19, 2004 2:26 am
This is my C code:
Code: |
int x(int a, int b)
{
return a - b;
}
int main(void)
{
int a[2] = {12, 77};
x(a[0], a[1]);
return 0;
}
|
This is the ASM output. (gcc -S test.c)
Code: |
.file "test.c"
.text
.align 2
.global x
.type x,function
x:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
mov ip, sp
stmfd sp!, {fp, ip, lr, pc}
sub fp, ip, #4
sub sp, sp, #8
str r0, [fp, #-16]
str r1, [fp, #-20]
ldr r2, [fp, #-16]
ldr r3, [fp, #-20]
rsb r3, r3, r2
mov r0, r3
ldmea fp, {fp, sp, pc}
.Lfe1:
.size x,.Lfe1-x
.section .rodata
.align 2
.LC0:
.word 12
.word 77
.text
.align 2
.global main
.type main,function
main:
@ args = 0, pretend = 0, frame = 8
@ frame_needed = 1, uses_anonymous_args = 0
mov ip, sp
stmfd sp!, {fp, ip, lr, pc}
sub fp, ip, #4
sub sp, sp, #8
ldr r3, .L3
ldmia r3, {r0-r1}
sub r2, fp, #16
stmda r2, {r0-r1}
ldr r0, [fp, #-20]
ldr r1, [fp, #-16]
bl x
mov r3, #0
mov r0, r3
ldmea fp, {fp, sp, pc}
.L4:
.align 2
.L3:
.word .LC0
.Lfe2:
.size main,.Lfe2-main
.ident "GCC: (GNU) 3.2.2 (DevKit Advance R5 Beta 3)"
|
It looks like the compiler moves variables/parameters into a physical memory location and then loads them up again. What is up with the whole
mov ip, sp stmfd sp!, {fp, ip, lr, pc} deal? Why are only three values being popped from the stack? Why is 8 being subtracted from the stack?
I think the array initialization is kind of nifty.
-Brendan
#29367 - sajiimori - Fri Nov 19, 2004 2:56 am
I wouldn't bother trying to decipher compiler output unless you're using at least -O2.
#29370 - DekuTree64 - Fri Nov 19, 2004 4:07 am
That is the craziest way to do one subtraction I have ever seen. It took me quite a while to even figure out why that works at all.
Appearently fp is mainly used for debugging, and it's storing your parameters relative to it so the debugger can look them up. Loading them back again is a total waste nomatter how you look at it though.
Also strange is that it would decide to use a reverse subtract instead of a normal one, and then mov to r0 instead of just putting the result of the subtract there. Not to mention that return sequence. ldmea means to decrement and then load, so since fp points to the 'pc' that was pushed onto the stack (4 less than the original sp), it decrements, then loads the lr that was pushed before into pc, then loads the pushed ip (which had just been set to the original sp) into sp, then loads the original fp.
I'm a little curious to see what it will do with debugging turned off and optimizations on.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#29375 - sgeos - Fri Nov 19, 2004 5:45 am
Is there a better way to turn off debugging?
gcc -O2 -g0 -S test.c
Code: |
.file "test.c"
.text
.align 2
.global x
.type x,function
x:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
rsb r0, r1, r0
@ lr needed for prologue
mov pc, lr
.Lfe1:
.size x,.Lfe1-x
.section .rodata
.align 2
.LC0:
.word 12
.word 77
.text
.align 2
.global main
.type main,function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
ldr r2, .L3
mov ip, sp
stmfd sp!, {r4, fp, ip, lr, pc}
ldmia r2, {r3-r4}
sub fp, ip, #4
mov r1, r4
mov r0, r3
bl x
mov r0, #0
ldmea fp, {r4, fp, sp, pc}
.L4:
.align 2
.L3:
.word .LC0
.Lfe2:
.size main,.Lfe2-main
.ident "GCC: (GNU) 3.2.2 (DevKit Advance R5 Beta 3)" |
-Brendan
#29377 - allenu - Fri Nov 19, 2004 5:59 am
Basically, what's happening is that in main, the code is creating local variables via the stack. It allocates room for it and then loads up the constant values (12 and 77) from another location in memory. Then, in anticipation for the function call to x(), it has to load the parameters into the r0 and r1 registers, which I assume are the registers used for passing the first two parameters.
Inside the function x(), it has to copy the contents from r0 and r1 to local variables 'a' and 'b', which are on the stack. Now the totally redundant step: it copies them to registers r2 and r3. It does the subtraction using these registers and copies the result to r0, which is used as the 'return value' register.
I'm not familiar with ARM assembly, but it looks to me that stmfd and ldmea are for storing and restoring multiple registers on the stack.
#29378 - allenu - Fri Nov 19, 2004 6:01 am
sgeos wrote: |
Is there a better way to turn off debugging?
gcc -O2 -g0 -S test.c
|
Wow, the optimizer did a good job here. It got rid of the unnecessary local variables. As far as I can tell, there is no debugging code in this or the original assembly output.
#29406 - sajiimori - Fri Nov 19, 2004 6:59 pm
If you want to see the compiler do a really good job, then declare 'x' static, compile with -O3, -fomit-frame-pointer, and rename 'main' to something else so it won't generate startup code. If you're using GCC 3.x, the results are almost humorous.
#29408 - isildur - Fri Nov 19, 2004 7:55 pm
Does anyone use -O3 for real code? I always heard that it's something not to do because it over-optimizes and can create bugs.
#29410 - sajiimori - Fri Nov 19, 2004 8:12 pm
Any differences in behavior are the result of compiler bugs. GCC has gotten more solid over the years, and we use -O3 in our studio.
#29419 - sgeos - Fri Nov 19, 2004 8:58 pm
sajiimori wrote: |
If you want to see the compiler do a really good job, then declare 'x' static, compile with -O3, -fomit-frame-pointer, and rename 'main' to something else so it won't generate startup code. If you're using GCC 3.x, the results are almost humorous. |
lol. I'll post the results for others.
gcc -fomit-frame-pointer -O3 -g0 -S test2.c
Code: |
.file "test2.c"
.section .rodata
.align 2
.LC0:
.word 12
.word 77
.text
.align 2
.global maine
.type maine,function
maine:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
@ lr needed for prologue
mov r0, #0
mov pc, lr
.Lfe1:
.size maine,.Lfe1-maine
.ident "GCC: (GNU) 3.2.2 (DevKit Advance R5 Beta 3)" |
Compiling with these options does, of course, defeat the purpose of the exercise. I'm trying to figure what the compiler is doing, and why it is working. In what order are registers pushed onto and popped off of the stack?
-Brendan
#29421 - allenu - Fri Nov 19, 2004 9:13 pm
Ha, nice. It's basically removed the unnecessary call to x() as the results aren't used by anything. I've seen this happen on some of our stuff at work. Kind of makes it tough to debug when code you're expecting to execute doesn't execute. :-) Hence, do not use the optimizations until your code is solid.
#29425 - sajiimori - Fri Nov 19, 2004 10:04 pm
Yeah, intentionally debugging an optimized build is pure insanity. :)
Occasionally it has to be done to work around a compiler bug or to fix a bug in your own code that was only revealed by optimizations, and then you're stuck debugging in assembler. >_<
#29428 - tepples - Fri Nov 19, 2004 11:08 pm
isildur wrote: |
Does anyone use -O3 for real code? |
TOD is compiled with -O3. The bugs it introduces in practice mostly relate to not declaring the right things as volatile.
allenu wrote: |
It's basically removed the unnecessary call to x() |
Not removed but inlined. Total removal of a function call should happen only if GCC thinks x() is a pure function, one with no side effects. Sometimes GCC can get confused about pure functions if you forget a volatile somewhere.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#29431 - sajiimori - Fri Nov 19, 2004 11:25 pm
x was purely functional. That's why there's no sign of it at all. The definition wasn't even generated because it was delcared static, so the compiler knows it couldn't be called elsewhere.