gbadev.org forum archive

Me again ;-)
So I've replaced as many strbs and all the funny variants of strb from my code, removed all the memcpys, memsets, strcpys, strncpys etc etc...but I'm still getting dodgy results.

My problems begin on one function which has no strbs, yet runs fine from normal memory. Here's the guts of the function (I have cut a lot away to replicate this funny condition btw):

Code:

void Mod_LoadPlanes (lump_t *l)
{
int       i, j;
mplane_t *out;
dplane_t *in;
int       count;
int       bits;

for ( i=0 ; i<count ; i++)
{
   for (j=0 ; j<3 ; j++)
   {
      out->normal[j] = LittleFloat (in->normal[j]);
   }

}
}

Again, it doesn't function correctly, but this code here will still generate weirdness.
The prototype for LittleFloat is

Code:

extern float (*LittleFloat) (float l);

ie, it's a function pointer so gets called through bx to a register.
Normal is of type fixed_point, and there's operator overloading which converts from one to the other. The fixed_point type is four bytes in size.

Yet the disassembly looks like this:

Code:

void Mod_LoadPlanes (lump_t *l)
201bbb4: e92d4070 stmdb sp!, {r4, r5, r6, lr}
201bbb8: e3a06000 mov r6, #0 ; 0x0
201bbbc: e24dd008 sub sp, sp, #8 ; 0x8
201bbc0: ea000013 b 201bc14 <_Z14Mod_LoadPlanesP6lump_t+0x60>
201bbc4: e3a05000 mov r5, #0 ; 0x0
201bbc8: e7950004 ldr r0, [r5, r4]
201bbcc: e59f3054 ldr r3, [pc, #84] ; 201bc28 <.text+0x1b9e8>
201bbd0: e593c000 ldr ip, [r3]
201bbd4: e1a0e00f mov lr, pc
201bbd8: e12fff1c bx ip
201bbdc: e28d4004 add r4, sp, #4 ; 0x4
201bbe0: e1a01000 mov r1, r0
201bbe4: e59f3040 ldr r3, [pc, #64] ; 201bc2c <.text+0x1b9ec>
201bbe8: e1a00004 mov r0, r4
201bbec: e1a0e00f mov lr, pc
201bbf0: e12fff13 bx r3
201bbf4: e0850004 add r0, r5, r4
201bbf8: e1a01004 mov r1, r4
201bbfc: e2855004 add r5, r5, #4 ; 0x4
201bc00: e3a02004 mov r2, #4 ; 0x4
201bc04: eb012e41 bl 2067510 <memcpy> <---- memcpy for four bytes!
201bc08: e355000c cmp r5, #12 ; 0xc
201bc0c: 1affffed bne 201bbc8 <_Z14Mod_LoadPlanesP6lump_t+0x14>
201bc10: e2866001 add r6, r6, #1 ; 0x1
201bc14: e1560004 cmp r6, r4
201bc18: baffffe9 blt 201bbc4 <_Z14Mod_LoadPlanesP6lump_t+0x10>
201bc1c: e28dd008 add sp, sp, #8 ; 0x8
201bc20: e8bd4070 ldmia sp!, {r4, r5, r6, lr}
201bc24: e12fff1e bx lr
201bc28: 020ac70c andeq ip, sl, #3145728 ; 0x300000
201bc2c: 02076114 andeq r6, r7, #5 ; 0x5

Using a bit of objdump and nm tells me that 20ac70c is the function pointer LittleFloat, so the ldr followed by the bx (at 201bbd0) does the call to LittleFloat.
2076114 (the target of the second bx) is the function which promotes float types to my fixed_point class (size 4 bytes). This fixed_point class is then stored in normal[j] (normal is of type *fixed_point).

So (if that made any sense)...what's with the four-byte memcpy? This is the code generated with -Os - I don't get the memcpy if I compile it without the option.

I wouldn't normally care about these memcpys too much, but they are breaking my slot-2 shenanigans.

So to reiterate:
- how do I get rid of four-byte memcpys? The whole point of me of doing the floating/fixed point stuff was to make it fast - extra code ain't gonna help
- if I can't get rid of the memcpy, how can I tell it to use a different (ie my) memcpy? I could replace the first instruction of memcpy with a branch to my memcpy, by that's a bit hacky.

Ta all.

PS: if there are mistakes, it's cos I'm tired!
_________________
Big thanks to everyone who donated for Quake2

Weird. I've tried to replicate the issue based on what you're reporting here, and I can't get it to generate a memcpy. Could you give the complete set of build-flags?

Since the DS is little endian and the code is probably quite far from being portable now, would it help to rip out all the LittleX function pointers and replace them with dummy inline functions or macros?
_________________
http://aaiiee.wordpress.com/

do you need to override the assigment operator for your fixed point class(operator=) ?

kusma wrote:

Weird. I've tried to replicate the issue based on what you're reporting here, and I can't get it to generate a memcpy. Could you give the complete set of build-flags?

I can't seem to find a similar problem in older builds (in that function, it may just happen elsewhere). In this build the float->fixed overloading is bx'ed, then memcpy'd. In older builds the float->fixed is inlined and no memcpy is used.
This happens regardless of doing it in thumb or arm...

The build line I'm using is pretty much what's in the makefile you have.

Quote:

-Dstricmp=strcasecmp -I "c:/devkitPro/libnds/include" -I "c:/devkitPro/libnds/include/nds" -DARM9 -mthumb-interwork -fno-rtti -fno-exceptions -g -Os -mtune=arm9

Quote:

do you need to override the assigment operator for your fixed point class(operator=) ?

Maybe..I'll have a go...

Quote:

would it help to rip out all the LittleX function pointers

Probably - but the thing I'm worried about is, if it can happen here, where else is it happening?

Hmm ;-)

EDIT: kusma, it happens in just three other places (bar model loading) - check out r_alias.c, line 413 (fixed point -> fixed point store), r_aclip.c, line 266 (float -> float store) and line 253 (float->float). Obv the source may have changed a little bit but just objdump the functions those line numbers lie in and search for bls to memcpy :-)
_________________
Big thanks to everyone who donated for Quake2

To be honest, I only tried to reproduce it stand-alone. Perhaps I'll dig into the sources and give it a go. But now I feel like sleeping ;)

memcpy does 32-bit writes as long as the source, destination, and number of bytes to copy are all word aligned or multiples of 4.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

A little off topic, but why -Os instead of -O2 or -O3?
If the correct code is generated with -O2, is there any advantage to smaller code size?

Lazy1 wrote:

A little off topic, but why -Os instead of -O2 or -O3?
If the correct code is generated with -O2, is there any advantage to smaller code size?

The correct code was being generated with no -O at all - I haven't tried O2/3 yet. But anyway, I've always used -Os and thumb due to the complete lack of memory. But now I don't need to cos I've got THIRTY TWO MEGS EXTRA MEMORY!
Can you tell I've just got it working? Time to dig out the old thread again ;-)

BTW: I objdump/grep/addr2lined the elf, looking for memcpys and manually sorting out the cases where it happens (not the most robust of methods!)
_________________
Big thanks to everyone who donated for Quake2

-O2 pads the code with nops due to some idea that aligning branch targets to 8 byte boundaries improves performance. It might on some systems, but probably not on this one. -Os does not add padding.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

Yay!

Probably would be a good idea to check for similar problems with memset too. I've had the compiler generate calls to that for things like initializing local structs to 0.

Also, have you gotten cache to work on the expanded memory? I remember someone was saying that you can't cache GBA cart space a while back, but I never got around to testing it.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Looking at the GCC docs, it appears that GCC will happily generate calls to memcpy, memmove, memset and memcmp, expecting these to be provided externally.

You could replace those functions in libg.a with your own GBA slot safe versions. Something like:

Code:

ar dv libg.a lib_a-memcpy.o lib_a-memset.o lib_a-memcmp.o lib_a-memmove.o
ar rs libg.a custom_fns_obj_file.o

gbadev.org forum archive

DS development > Extra calls to memcpy - how do I get rid of these?

#134801 - simonjhall - Tue Jul 17, 2007 12:08 am

#134803 - kusma - Tue Jul 17, 2007 12:38 am

#134804 - PeterM - Tue Jul 17, 2007 12:54 am

#134838 - elhobbs - Tue Jul 17, 2007 2:42 pm

#134877 - simonjhall - Tue Jul 17, 2007 10:12 pm

#134883 - kusma - Tue Jul 17, 2007 11:08 pm

#134884 - Dwedit - Tue Jul 17, 2007 11:18 pm

#134885 - Lazy1 - Tue Jul 17, 2007 11:31 pm

#134886 - simonjhall - Tue Jul 17, 2007 11:51 pm

#134887 - Dwedit - Wed Jul 18, 2007 12:08 am

#134889 - DekuTree64 - Wed Jul 18, 2007 12:12 am

#134893 - masscat - Wed Jul 18, 2007 1:02 am