gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > Digging deep

#10975 - FluBBa - Mon Sep 22, 2003 4:04 pm

I'm off into deep water right now and wondering if anybody else has any experience with changing CPU mode?
According to the ARM7 manual the FIQ mode switches regs r8-r14 & r16, as the GBA can't make a FIQ from hardware this could be very useful for doing some hardcore stuff and not needing to put stuff on the stack.
Is this feature at all implemented in the GBA's ARM?
Is it safe to use it with other IRQ's?
_________________
I probably suck, my not is a programmer.

#10976 - torne - Mon Sep 22, 2003 4:17 pm

Yes, you can safely switch to FIQ mode; the gameboy's FIQ pin is pulled high permanently. All same-model ARM cores are the same; the GBA's ARM7tdmi is exactly the same as every other ARM7tdmi, so anything that's documented by ARM should work (the GBA's core is missing coprocessor 15, but CP15 is not a *required* feature, only a common one).

Remember, though, that you can only enter FIQ mode from IRQ mode on the GBA (since you need to be in a privileged mode to switch). One use is to make your IRQ handler immediately switch to FIQ, do the work, then switch back (it's important that you switch back, as the BIOS's interrupt handler won't be able to clean up properly after you if you leave the CPU in FIQ mode). This gives you those nice extra registers to play with. =)

#10977 - DekuTree64 - Mon Sep 22, 2003 4:44 pm

Actually I was doing some tests on it to see about using those regs in a sound mixer a while back, and it seems GBA programs normally run in system mode, so you can switch to FIQ mode anytime you want. The docs say never to change any of the bits other than the mode/condition flags, but it seems to work fine to just use msr cpsr, #0x11 (FIQ mode) and then msr cpsr, #0x1f (system) to switch back for only 1 cycle each. I tried it on hardware and it did definately switch to the FIQ regs and back and didn't lock up or anything. You might want to use a full
mrs rTemp, cpsr
bic rTemp, rTemp, #0x1f
orr rTemp, rTemp, #0x11
msr cpsr, rTemp
if you're doing anything serious though, just to be sure nothing bad will happen. You could get rid of that orr by just doing a bic rTemp, rTemp, #0xe if you know you're in system mode (0x1f) to begin with.
But yes, it definately has some possibilities.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#10981 - torne - Mon Sep 22, 2003 5:05 pm

Ah, yes, of course it does.. had supervisor and system mode mixed up. The reason ARM demand that you don't destroy the other bits is that they reserve the right to use them in future revisions of the chip. The chip version in the GBA, and indeed the current version of the ARM7tdmi, don't use the other bits, so it's ok to just nuke them (this is not the case when developing for 'normal' hardware where it might be desireable to have your code keep working on future chip revisions).

#10982 - tepples - Mon Sep 22, 2003 5:52 pm

torne wrote:
The reason ARM demand that you don't destroy the other bits is that they reserve the right to use them in future revisions of the chip. The chip version in the GBA, and indeed the current version of the ARM7tdmi, don't use the other bits, so it's ok to just nuke them (this is not the case when developing for 'normal' hardware where it might be desireable to have your code keep working on future chip revisions).

The GBX (Game Boy Extreme, rumored successor to the GBA SP to compete with Sony's PSP) may in fact use such "future chip revisions."
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#10985 - DekuTree64 - Mon Sep 22, 2003 7:06 pm

Yeah, but even so it would probably be a good idea to make sure you keep the ARM bit set if you're running in ARM mode, though I'm pretty sure I was and set it to 0 and still nothing bad happened. Bit with all those free regs, you could use one of your normal r8-r14 for the cpsr value to switch to FIQ, and one of your FIQ r8-r14 for the switch back value, that way you wouldn't have to worry about any future problems, and still have 5 more regs than normal. That might really work out well for a 4chn/side one-pass mixer. How 'bout this register layout...
r0: temp to load samples
r1: temp to hold 2 samples to multiply by vol
r2: temp to mix samples into
r3: 0x8000000 to add to pos>>8
r4: c0 (data << 8) + pos (24.8 fp, upper 8 bits is always 0x08000000, so it's shifted off)
r5: c0 inc
r6: c0 vol
r7: c1 data
r8: c1 inc
r9: c1 vol
f8: c2 data
f9: c2 inc
f10: c2 vol
f11: c3 data
f12: c3 inc
f13: c3 vol
r10: signAdjust (see *)
r11: mixBuf
r12: temp
r13: stack
r14: FIQ mode cpsr val
f14: sys mode cpsr val

*multiply channel volumes by -128 (128 is like 0 in 8-bit unsigned) and add them up to get this, corrects for using unsigned samples. Then clear the upper 16 bits and orr the lower 16 into the upper 16 to mix 2 samples at a time. Add samples to it and the result is signed

Then use a loop like
mixLoop:
ldrb r0, [r3, r4, LSR #8]
add r4, r4, r5, LSL #1
ldrb r1, [r3, r4, LSR #8]
sub r4, r4, r5
orr r1, r0, r1, LSL #16
mla r2, r1, r6, r10 @r2=samples*vol+signAdjust

ldrb r0, [r3, r7, LSR #8]
add r7, r7, r8, LSL #1
ldrb r1, [r3, r7, LSR #8]
sub r7, r7, r8
orr r1, r0, r1, LSL #16
mla r2, r1, r9, r2

msr cpsr, r14

mix c2
mix c3

msr cpsr, r14 @this is f14 now
bic r2, r2, #0xff00
bic r12, r2, #0xff000000

repeat everything (we just did the even samples, now mix the odds), except bic to r2 instead of r12

orr r12, r2, r12, LSR #8
str r12, [r11], #4
ldr r12, [sp, #mixBufEnd]
cmp r11, r12
blt mixLoop

I think it would be pretty fast. Especially if you unroll it a few times so you don't have to load mixBufEnd every time.
But still, that's 6 cycles/sample + (2 for msr's, 4 for bic's, 1 for orr, and 2 for str) for every 4 samples, not including the looping, since it could be cut down to basically nothing with unrolling, that's about 6.56 cycles/sample. With a slower loop to deal with channels ending/looping, that ought to come pretty close to matching James Daniels' mixer, which is my bane^_^
Still, I don't think it would quite match it, and even more so with a music player running at the same time.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#10986 - torne - Mon Sep 22, 2003 7:06 pm

If it's going to compete with the PSP, it probably won't be an ARM7 any more. =)

Even if it was, ARM don't revise their cores very often (we're talking about the same model here; different models already have slightly differing instruction sets and scheduling models), and none of their revisions to any code model so far have used additional CPSR bits. ARM are very cautious about changing their external interfaces; for example, there is an instruction set change in process which has been going for a number of years now (deprecating use of the 'never' conditional, thus freeing up all words which have the 'never' prefix for future instructions) - they seem pretty determined not to actually stop supporting 'never' until nobody who's writing ARM code can even remember a time when it was standard =)

#10995 - tepples - Mon Sep 22, 2003 11:51 pm

torne wrote:
If it's going to compete with the PSP, it probably won't be an ARM7 any more. =)

It'd probably be an ARM9 with an ARM7-cycle-accurate emulation mode.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.