gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > What's wrong with memcpy() ?

#177211 - sverx - Tue Jan 24, 2012 2:46 pm

Is it just happening to me that memcpy() can't copy 4 bytes? It doesn't even work with 8, I had to copy 16 bytes to make it work :| (dest and src are word aligned...)
_________________
libXM7|NDS programming tutorial (Italiano)|Waimanu DS / GBA|A DS Homebrewer's Diary

#177213 - elhobbs - Tue Jan 24, 2012 4:39 pm

are you copy to main memory or some other region like vram?

#177215 - Dwedit - Tue Jan 24, 2012 6:58 pm

Step through with a debugger and see what's going on? (Are there any good GBA debuggers out there besides NO$GBA paid version?)

Anyway, I like to replace memcpy with my own version, because it doesn't use an optimized ldmia/stmia loop to copy.
Tonc has a Replacement memcpy function written in assembly that's more optimized. Except it uses word counts instead of byte counts. One instruction at the beginning can change that.
I use a different one myself.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#177216 - sverx - Wed Jan 25, 2012 9:28 am

elhobbs: I'm copying from main memory to VRAM, but both source and destination are word aligned and I'm copying an even number of halfwords (so no partial words)

Dwedit: Yes, I do plan to change it to a faster memcpy() later if needed, but it really left me astonished. Could that be a bug in the memcpy() function or it's working from 16 bytes on by design? (I have no debugger to use, I checked the --save-temps generated file and it seems to me the memcpy() call is correct...)
_________________
libXM7|NDS programming tutorial (Italiano)|Waimanu DS / GBA|A DS Homebrewer's Diary

#177217 - elhobbs - Wed Jan 25, 2012 2:48 pm

16 bytes is the threshold for word copies in memcpy. 4 bytes is the threshold for half word copies and the rest is byte copies. I am surprised that the half word copies are not working - there must be something missing here. I suspect this works fine from main memory to main memory as I use it quite a bit for that purpose - with no problems.

I have just always assumed that memcpy should not be used for vram as there are faster options - like dma or ldmia/stmia.

#177218 - sverx - Wed Jan 25, 2012 2:57 pm

Quote:
4 bytes is the threshold for half word copies

So it should be working in my case, but it isn't. :|

mmm... well, I might have made a wrong assumption,then. For instance, when I include a .bin file, does its starting address is word aligned?
_________________
libXM7|NDS programming tutorial (Italiano)|Waimanu DS / GBA|A DS Homebrewer's Diary

#177219 - elhobbs - Wed Jan 25, 2012 5:33 pm

I am not sure - you could look at the address in the map file or you try adding an alignment attribute to the definition ALIGN(4) should do it.

#177220 - sverx - Wed Jan 25, 2012 5:49 pm

Right, the map file... mmm... looks like they're all aligned correctly. Every address is multiple of 4.
mmm... the mystery deepens :|

Code:
 .rodata        0x0200217c     0x3004 bg2test.img.bin.o
                0x0200217c                bg2test_img_bin
                0x0200517c                bg2test_img_bin_end
                0x0200517c                bg2test_img_bin_size
 .rodata        0x02005180      0xc04 bg2test.map.bin.o
                0x02005180                bg2test_map_bin
                0x02005d80                bg2test_map_bin_end
                0x02005d80                bg2test_map_bin_size
 .rodata        0x02005d84     0xe5c4 bgtest.img.bin.o
                0x02005d84                bgtest_img_bin
                0x02014344                bgtest_img_bin_size
                0x02014344                bgtest_img_bin_end
 .rodata        0x02014348     0x3004 bgtest.map.bin.o
                0x02014348                bgtest_map_bin
                0x02017348                bgtest_map_bin_end
                0x02017348                bgtest_map_bin_size
 .rodata        0x0201734c      0x204 bgtest.pal.bin.o
                0x0201734c                bgtest_pal_bin
                0x0201754c                bgtest_pal_bin_size
                0x0201754c                bgtest_pal_bin_end
 .rodata        0x02017550      0x804 sprite.img.bin.o
                0x02017550                sprite_img_bin
                0x02017d50                sprite_img_bin_end
                0x02017d50                sprite_img_bin_size
 .rodata        0x02017d54      0x204 sprite.pal.bin.o
                0x02017d54                sprite_pal_bin
                0x02017f54                sprite_pal_bin_end
                0x02017f54                sprite_pal_bin_size

_________________
libXM7|NDS programming tutorial (Italiano)|Waimanu DS / GBA|A DS Homebrewer's Diary

#177221 - elhobbs - Wed Jan 25, 2012 6:33 pm

I seem to remember that gcc may optimize memcpy calls and replace it with inline code. can you verify that memcpy is being called in this section? you could try it with and without optimizations turned on and see if you get different results.

#177222 - Cearn - Wed Jan 25, 2012 7:01 pm

elhobbs wrote:
16 bytes is the threshold for word copies in memcpy. [b]4 bytes is the threshold for half word copies and the rest is byte copies[b].

Are you sure about the halfword thing? As far as I know, it's just byte copies under size 16.

Dwedit wrote:

Tonc has a Replacement memcpy function written in assembly that's more optimized. Except it uses word counts instead of byte counts. One instruction at the beginning can change that.

Instead of tonc's routines, use these: http://coranac.com/files/misc/armfuncs.zip. They work for odd byte-sizes and in unaligned cases as well.

I think I have some timing data lying around here as well somewhere if you're interested, but IIRC it beats memcpy/set in almost all cases.

#177223 - elhobbs - Wed Jan 25, 2012 10:04 pm

Cearn wrote:
elhobbs wrote:
16 bytes is the threshold for word copies in memcpy. [b]4 bytes is the threshold for half word copies and the rest is byte copies[b].

Are you sure about the halfword thing? As far as I know, it's just byte copies under size 16.
no, not really sure - just looking at the unpatched newlib source code. it did not look like there was a replacement memcpy in the patches for devkitarm. I did not look at the generated asm/code either.

#177224 - sverx - Thu Jan 26, 2012 10:11 am

Cearn wrote:
Are you sure about the halfword thing? As far as I know, it's just byte copies under size 16.


That would explain the problem, since VRAM byte writes are ignored. So actually it's not memcpy() fault, it's that it's not kind of 'VRAM safe'...
Not a big problem anyway, I will simply copy 16 bytes now, and I'll replace the memcpy() with something faster if needed.

Thanks! :)
_________________
libXM7|NDS programming tutorial (Italiano)|Waimanu DS / GBA|A DS Homebrewer's Diary

#177225 - kusma - Thu Jan 26, 2012 1:31 pm

memcpy can be implemented to copy a single byte at the time, which doesn't work when you copy to VRAM (at least not on GBA). Perhaps this is your issue?