gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > NDSBios-Function: CpuFastSet-Bug

#178206 - Jim - Sun Jul 27, 2014 5:16 am

Hi,

I wondered what the reason for the bug in the Bios-Function "CpuFastSet" of the NDS was (Description in GBATEK). Now that I have the NDS9-Bios disassembled I finally found the answer and wanted to share it here.
Maybe with the result that some people use a corrected version in the future.

Here is the Code. It can be found in NDS9-Bios on address 0x940.

Code: Select all

CpuFastSet:
b940:
		stmdb	sp!, {r4, r5, r6, r7, r8, r9, r10, lr}
		mov		r10, r2, lsl #11
		add		lr, r1, r10, lsr #9
		mov		r10, r10, lsr #14
		add		r10, r1, r10, lsl #3

		movs	r2, r2, lsr #25	@ Test if Bit 24 is set (fixed source address)
		bcc		b998
		ldr		r2, [r0]		@ Copy from fixed source
		mov		r3, r2
		mov		r4, r2
		mov		r5, r2
		mov		r6, r2
		mov		r7, r2
		mov		r8, r2
		mov		r9, r2
b97C:
		cmp		r1, r10
		stmltia	r1!, {r2, r3, r4, r5, r6, r7, r8, r9}
		blt		b97C
b988:							@ copy the rest
		cmp	r1, lr
		stmltia	r1!, {r2}
		blt		b988
		b		b9B8
b998:							@ Copy from a not fixed source
		cmp	r1, r10
		ldmltia	r0!, {r2, r3, r4, r5, r6, r7, r8, r9}
		stmltia	r1!, {r2, r3, r4, r5, r6, r7, r8, r9}
		blt		b998
b9A8: 							@ copy the rest
		cmp	r1, lr
		ldmltia	r0!, {r2}
		stmltia	r1!, {r2}
		blt	b9A8
b9B8:
		ldmia	sp!, {r4, r5, r6, r7, r8, r9, r10, lr}
		bx	lr
The important part is the first 5 lines, or more correctly the 2nd - 5th line.
r10 and lr are calculated to have the end addresses were the memory has to be copied. lr is a multiple of 4 and r10 is a multiple of 32 (or should be).

Here is an example of these 4 lines:
r1 is just the destination address to add the offset to
r2 is the length or number of words to copy
lr is the right end address of the destination
r10 should be the right end address that is divisable by 32

Code: Select all

@ r2 = 81

@ r10 = 81 << 11
@ lr = r1 + r10 = r1 + &#40;81 << 11&#41; >> 9 = r1 + 81 << 2 = r1 + 81*4 = r1 + 324
@ r10 = &#40;r10 >> 14&#41; = &#40;81 << 11&#41; >> 14 = 81 >> 3 = 10
@ r10 = r1 + &#40;r10 << 3&#41; = r1 + 10 << 3 = r1 + 80
r10 is r1 + 80 instead of r1 + 320 as it should be.

So the correct calculation at the last line should be a shift to the left by 5 not only 3 (that's the reason why only a quarter of the bytes is copied the fast way):

Code: Select all

@ r10 = r1 + &#40;r10 << 5&#41; = r1 + 10 << 5 = r1 + 10 << 5 = r1 + 320

or in ASM
		add		r10, r1, r10, lsl #3
So that's just what I found out, in the end it wasn't much but at least I know the reason now. Hope that is also interesting for someone else.