#176023 - sverx - Tue Mar 22, 2011 12:23 pm
I'm not sure C/C++ section is the correct one, but this isn't exactly an ASM question so I guess it could be placed here... ok, let's go on ;)
I've been very curious lately to see how the compiler works on some C code I'm writing, thus I added that --save-temps to the command line and I started reading the output .s files. But they already made me think twice that the compiler is doing kind of strange things.
For instance I've got an array of structs and I need to initialize (at value=0xFFFF) the 5th, 6th, 7th, 8th element of the struct (they're all u16) at a given index of the array. The compiler does this (comments are mine):
Code: |
ldr r1, .L37+8 @ this loads the start point + 8 for the off.
lsl r3, r0, #4 @
add r1, r1, r3 @ makes r1 point to the element to change
mov r2, r1 @ copy to r2 -???-
mov r3, #1 @
neg r3, r3 @ these 2 lines sets r3 to -1
add r2, r2, #8 @ add 8 to r2 -???-
strh r3, [r1, #8]
strh r3, [r2, #2]
strh r3, [r2, #4]
strh r3, [r2, #6]
|
The question is: why is it using r2? It couldn't just use one register and the 4 different offsets?
#176024 - Quirky - Tue Mar 22, 2011 1:36 pm
What's the C code that you're using and the optimisation flags? I can't reproduce this with -Os and the following code:
Code: |
struct test {
short null0;
short null1;
short null2;
short null3;
short a;
short b;
short c;
short d;
};
extern struct test *t;
void set_to_ff(void)
{
t->a = -1;
t->b = -1;
t->c = -1;
t->d = -1;
}
|
That gives the following thumb:
Code: |
ldr r3, .L3
ldr r2, [r3]
mov r3, #1
neg r3, r3
strh r3, [r2, #8]
strh r3, [r2, #10]
strh r3, [r2, #12]
strh r3, [r2, #14]
|
#176025 - sverx - Tue Mar 22, 2011 1:47 pm
extract from the code:
Code: |
typedef struct node {
unsigned short int m;
unsigned short int n;
unsigned short int o;
unsigned short int p;
unsigned short int a;
unsigned short int b;
unsigned short int c;
unsigned short int d;
} node;
node Set[4096];
(...)
Set[n].a=0xFFFF;
Set[n].b=0xFFFF;
Set[n].c=0xFFFF;
Set[n].d=0xFFFF; |
flags are Quote: |
-g -Wall -O2 -fomit-frame-pointer -ffast-math --save-temps -mthumb -mthumb-interwork -march=armv5te -mtune=arm946e-s |
your code looks like what I'd expected to see... :|
#176026 - Quirky - Wed Mar 23, 2011 9:11 am
This is interesting. If you have this code:
Code: |
void set_to_ff(int n)
{
Set[n].a=0xFFFF;
Set[n].b=0xFFFF;
Set[n].c=0xFFFF;
Set[n].d=0xFFFF;
}
|
Then the thumb output has the extra additions that you mention, like this:
Code: |
ldr r2, .L3 @ r2 = Set
lsl r0, r0, #4 @ r0 = n * 16
mov r3, #1
neg r3, r3 @ r3 = 0xffff:ffff
add r0, r0, #8 @ r0 += 8
strh r3, [r0, r2] @ *(r2 + r0) = 0xffff
add r0, r2, r0 @ r0 = Set + (n * 16 + 8)
strh r3, [r0, #2] @ *r0 = 0xffff, ... etc.
strh r3, [r0, #4]
strh r3, [r0, #6]
|
When you provide a pointer alias to the n'th entry, you get better code:
Code: |
void set_to_ff(int n)
{
struct test *s = &Set[n];
s->a=0xFFFF;
s->b=0xFFFF;
s->c=0xFFFF;
s->d=0xFFFF;
}
|
This is the thumb:
Code: |
ldr r3, .L3 @ r3 = Set
lsl r0, r0, #4 @ r0 = n * 16
add r0, r3, r0 @ r0 = Set + n * 16
mov r3, #1
neg r3, r3 @ r3 = 0xffff:ffff
strh r3, [r0, #8] @ r0[8] = 0xffff ... etc.
strh r3, [r0, #10]
strh r3, [r0, #12]
strh r3, [r0, #14]
|
The 2nd snippet is more idiomatic C IMO, since it has a less copy-paste look to it.
#176027 - sverx - Wed Mar 23, 2011 9:37 am
Quirky wrote: |
This is interesting. |
Indeed. Looks like it's not really performing well, IMHO. BTW it's very interesting that its behavior can be changed simply declaring a pointer to the n'th entry and using that. It's a good workaround.
Another strange thing I've seen is that other one:
Code: |
ldr r2, .L5 @ load array address
lsr r3, r0, #3 @ calculate the element I need
ldrb r3, [r2, r3] @ load the element
mov r2, #7 @
and r0, r0, r2 @ calculate the bit I want to take
asr r3, r3, r0 @ shift it to the LSB
mov r0, r3 @ copy r3 to r0 -???-
mov r3, #1 @
and r0, r0, r3 @ clear other bits
bx lr |
this extracts a requested bit from a byte array (returns 0 or 1).
Why is it copying r3 to r0? Couldn't simply load 1 to r0 and AND it with r3?
#176028 - Quirky - Wed Mar 23, 2011 12:28 pm
I wouldn't obsess too much over these things. In most cases you can write C code that helps the compiler (like the pointer thing), in other cases you're better off having clearer code. If you hit a bottleneck, optimise there, but only after profiling to see where the slowdowns are.
#176029 - sverx - Wed Mar 23, 2011 1:26 pm
Quirky wrote: |
I wouldn't obsess too much over these things. |
Of course. It's just curiosity, and it may teach me some interesting things, actually. BTW I switched to using a pointer to the n'th element, as in your example, and the resulting asm code didn't change at all. I guess it's because the code I pasted is actually part of a more complex function, even if there's just a simple if before that part.
Thanks! :)
#176030 - Miked0801 - Wed Mar 23, 2011 7:17 pm
Let the compiler do the compiling. It's hard enough to code without worrying about the little details. Yes, it's interesting to see how the compiler does things, and yes, quite often it doesn them a bit sub-optimally, but as long as it works, a little inefficiency around the edges isn't going to cause much harm. A profile will show you where your efforts will be better spent.
#176032 - sverx - Thu Mar 24, 2011 8:59 am
Actually, how do you guys profile? Which are the tools I could use, beside from building my own profiling routines? (I know it's OT but YOU triggered it ;) )
#176033 - Quirky - Thu Mar 24, 2011 9:27 am
For GBA, you can profile using Visual Boy Advance and gprof.
I've not profiled DS code natively. The tools just aren't there. I get the parts I want to profile compiling for Linux and then use its gprof.
There's also the old skool profiling technique using colored bands. Set BG_PALETTE[0] to red at the start of a section you want to make faster, then black at the end. The red band shows how long the function took to execute in terms of horizontal lines on the screen. The challenge is to try and make the red band smaller.
#176034 - sverx - Thu Mar 24, 2011 9:36 am
Actually I'm 'profiling' saving the VCOUNT at the beginning and at the end of the section and printing those at the end, which is kind of changing the palette color 0 but easier to read ;)
This of course doesn't tell me how many times a function has been called and how long the elaboration required in each single function, so I think I really can't call 'profiling' what I'm doing now... but the problem is that I found no tools (on DS) to do what I'd like to, that's why I was wondering...
#176053 - Miked0801 - Mon Mar 28, 2011 6:00 pm
The pay version of No$ has a nice, non-intrusive profiler built in. Failing that, you get to do it old school - use a timer at the highest resolution and record the start and stop time in the functions use wish to profile, store them over time and compare vs the entire gameloop's time to get a feel for overall percentage of time per gameloop in a funciton.
For DS, you can use the hardware timers if they are available. They have a very high precision. Next up would be the VCOUNT register with a decent resolution. Keep in mind that profiling of this sort is instrusive and will slow down your game and can skew your profile results if you aren't careful to omit profile overhead from your results. It also burns some RAM for use in storing the various hueristics.
#176059 - sverx - Tue Mar 29, 2011 1:07 pm
Thank you :)