#178206 - Jim - Sun Jul 27, 2014 5:16 am
Hi,
I wondered what the reason for the bug in the Bios-Function "CpuFastSet" of the NDS was (Description in GBATEK). Now that I have the NDS9-Bios disassembled I finally found the answer and wanted to share it here.
Maybe with the result that some people use a corrected version in the future.
Here is the Code. It can be found in NDS9-Bios on address 0x940.
The important part is the first 5 lines, or more correctly the 2nd - 5th line.
r10 and lr are calculated to have the end addresses were the memory has to be copied. lr is a multiple of 4 and r10 is a multiple of 32 (or should be).
Here is an example of these 4 lines:
r1 is just the destination address to add the offset to
r2 is the length or number of words to copy
lr is the right end address of the destination
r10 should be the right end address that is divisable by 32
r10 is r1 + 80 instead of r1 + 320 as it should be.
So the correct calculation at the last line should be a shift to the left by 5 not only 3 (that's the reason why only a quarter of the bytes is copied the fast way):
So that's just what I found out, in the end it wasn't much but at least I know the reason now. Hope that is also interesting for someone else.
I wondered what the reason for the bug in the Bios-Function "CpuFastSet" of the NDS was (Description in GBATEK). Now that I have the NDS9-Bios disassembled I finally found the answer and wanted to share it here.
Maybe with the result that some people use a corrected version in the future.
Here is the Code. It can be found in NDS9-Bios on address 0x940.
Code: Select all
CpuFastSet:
b940:
stmdb sp!, {r4, r5, r6, r7, r8, r9, r10, lr}
mov r10, r2, lsl #11
add lr, r1, r10, lsr #9
mov r10, r10, lsr #14
add r10, r1, r10, lsl #3
movs r2, r2, lsr #25 @ Test if Bit 24 is set (fixed source address)
bcc b998
ldr r2, [r0] @ Copy from fixed source
mov r3, r2
mov r4, r2
mov r5, r2
mov r6, r2
mov r7, r2
mov r8, r2
mov r9, r2
b97C:
cmp r1, r10
stmltia r1!, {r2, r3, r4, r5, r6, r7, r8, r9}
blt b97C
b988: @ copy the rest
cmp r1, lr
stmltia r1!, {r2}
blt b988
b b9B8
b998: @ Copy from a not fixed source
cmp r1, r10
ldmltia r0!, {r2, r3, r4, r5, r6, r7, r8, r9}
stmltia r1!, {r2, r3, r4, r5, r6, r7, r8, r9}
blt b998
b9A8: @ copy the rest
cmp r1, lr
ldmltia r0!, {r2}
stmltia r1!, {r2}
blt b9A8
b9B8:
ldmia sp!, {r4, r5, r6, r7, r8, r9, r10, lr}
bx lr
r10 and lr are calculated to have the end addresses were the memory has to be copied. lr is a multiple of 4 and r10 is a multiple of 32 (or should be).
Here is an example of these 4 lines:
r1 is just the destination address to add the offset to
r2 is the length or number of words to copy
lr is the right end address of the destination
r10 should be the right end address that is divisable by 32
Code: Select all
@ r2 = 81
@ r10 = 81 << 11
@ lr = r1 + r10 = r1 + (81 << 11) >> 9 = r1 + 81 << 2 = r1 + 81*4 = r1 + 324
@ r10 = (r10 >> 14) = (81 << 11) >> 14 = 81 >> 3 = 10
@ r10 = r1 + (r10 << 3) = r1 + 10 << 3 = r1 + 80
So the correct calculation at the last line should be a shift to the left by 5 not only 3 (that's the reason why only a quarter of the bytes is copied the fast way):
Code: Select all
@ r10 = r1 + (r10 << 5) = r1 + 10 << 5 = r1 + 10 << 5 = r1 + 320
or in ASM
add r10, r1, r10, lsl #3