gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Audio > Audio mixing problem

#34953 - gbawiz - Fri Jan 28, 2005 5:50 pm

Hello,
I have a question about the mixing of more than one channel of audio for playback via the GBA audio port.
I have managed to setup an ISR which makes use of a double buffer method for streaming the audio to the DMA sound port.

I have created an assembly routine for stepping the sample at the correct mixing frequencies, etc. I got all of the information on stepsample from the following site:

http://oxygen.it.net.au/mixing/section4.html

When I step all 4 channels (or even 3) i get some of the sound but with a lot of clicking and it would appear that the 1/60th of a second for mixing all 4 channels into the 304 byte mixing buffer is not enough time.

Should the mixing be done during the ISR?

Thannks

#34956 - poslundc - Fri Jan 28, 2005 6:13 pm

Is your routine in ARM assembly and is it being placed in IWRAM? You should have plenty of processor time to mix four channels even with a relatively inefficient mixer, especially if you aren't trying to make it share the processor with a game yet.

It seems more probable that there is a programming error in your code than you are running out of processor time. This is especially more likely if you've written it in assembly. Does it function correctly when you only have one channel? Try tracing the state of the registers, stack and memory as you increase the number of channels you are mixing.

Dan.

#34957 - gbawiz - Fri Jan 28, 2005 6:22 pm

Here is the assembly code for the stepsample routine which I wrote:
Note: I had backed up registers before and after the routine but found that absolutely nothing happens, so I removed them. (it would seem that adding these seems to make things worse)
stmfd r13!, {r4-r11, r14}
assembly routine here
ldmfd r13!, {r4-r11, r14}


Code:
@----------adding 4 to the registers because there are 4 bytes which indicate the address (32bit)-------------@


@----- CODE START ----@
   .global stepsample

stepsample:

ldr r1,[r0]         @load r1 with mix_lowspeed
add r0,r0,#4      @point r0 to mix_count
ldr r2,[r0]         @load r2 with mix_count
adds r2,r2,r1      @add mix_lowspeed to mix_count & set carry
str r2,[r0]         @store result in mix_count


add r0,r0,#4      @point r0 to mix_curptr
mov r10,r0         @copy r0 to r10
ldr r1,[r0]         @load r1 with mix_curptr
add r0,r0,#4      @point r0 to mix_highspeed
ldr r2,[r0]         @load r2 with mix_highspeed

adc r1,r1,r2      @add mix_highspeed to mix_curptr with carry
add r0,r0,#4      @point r0 to mix_loopend
ldr r2,[r0]         @load r2 with mix_loopend


cmp r1,r2         @compare mix_loopend with mix_currentptr
bne savereturn      @if not equal then jump to procedure called savereturn

add r0,r0,#4      @point r0 to mix_looplength
ldr r2,[r0]         @load r2 with mix_looplength
cmp r2,#0         @compare r2 with number zero
bne loopback      @if r2 not zero then goto loopback procedure

add r0,r0,#4      @point r0 to mix_activeflag
mov r3,#0         @load r3 with value zero
str r3,[r0]         @store zero in mix_activeflag
b savereturn      @goto save and return procedure





loopback:
sub r1,r1,r2      @deduct mix_looplength from mix_currentptr


savereturn:
str r1,[r10]      @store new value of mix_curptr
bx lr @return;


@------ CODE END ----@
@pool
@endarea



The mixer works when one channel is enabled only (no point in mixer then).
But when i add more than one then there are problems.
For test purposes I allowed only one then added a time delay loop and it resulted in much the same as when using more than one channel. i.e a load of noize with the sample in the background playing quite slowly.

#34963 - poslundc - Fri Jan 28, 2005 7:10 pm

I can't comment on the correctness of your routine because I'm having a difficult time figuring out what it's doing.

I can tell you that you are making some very basic errors, though, which may not affect the output of your sound but indicate that your code is hacked together. If your code is being called as a function from C, you must preserve r4-r11 and r14 across the function call, and your routine clobbers r10. You are also setting the status flags in several of your arithmetic operations for no apparent reason.

I would write your mixer code in C first, get it working the way you want to, then look at porting your C code to assembly. That way you'll have a template that you know works for the assembly to be built upon.

Dan.

#34975 - gbawiz - Fri Jan 28, 2005 8:34 pm

I tried writing the mixing routine in 'c' and found that it is worse.
It's as if the 1/60th of a second time is not enough to fill the buffers with the mixed sound.

#34979 - poslundc - Fri Jan 28, 2005 9:54 pm

If the C code is being compiled in ARM mode and placed in IWRAM (see the instructions specific to your devkit to accomplish this), then a full VBlank cycle should be plenty of time to mix four channels into a 304-byte buffer. Even if someone were to write a really lackadaisical C-based mixer, I'd still ballpark that they would be able to mix something on the order of 400 channels in a single draw cycle. There's no way you should be running out of time at four.

If it's not working, it's probably because there's something wrong with your code.

Dan.

#34980 - isildur - Fri Jan 28, 2005 10:08 pm

poslundc wrote:
lackadaisical


Learned a new word today :)

#34981 - poslundc - Fri Jan 28, 2005 10:13 pm

I do my best to enrich the vocabularies of GBA programmers everywhere.

Dan.

#34982 - DekuTree64 - Fri Jan 28, 2005 10:19 pm

poslundc wrote:
Even if someone were to write a really lackadaisical C-based mixer, I'd still ballpark that they would be able to mix something on the order of 400 channels in a single draw cycle


Huh? That would be 0.25% CPU per channel. I've never even broken 1% per channel with my best assembly tricks, and the simple yet not too horribly written mixer in my sound tutorial does take a good chunk of the frame to mix 304 samples with 4 channels. That's THUMB in ROM though.

gbawiz, try domething like this to see how long it's taking:
Code:
while (REG_VCOUNT != 0){}
PAL_BG[0] = RGB(31, 0, 0);
SoundMix();
PAL_BG[0] = 0;

Then the red portion of the screen is where it's mixing, and the black portion is unused. If the entire screen is red, then yes, I would say speed is the problem.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#34986 - poslundc - Fri Jan 28, 2005 11:40 pm

DekuTree64 wrote:
poslundc wrote:
Even if someone were to write a really lackadaisical C-based mixer, I'd still ballpark that they would be able to mix something on the order of 400 channels in a single draw cycle


Huh? That would be 0.25% CPU per channel. I've never even broken 1% per channel with my best assembly tricks, and the simple yet not too horribly written mixer in my sound tutorial does take a good chunk of the frame to mix 304 samples with 4 channels. That's THUMB in ROM though.


Whoops; I meant to type 40 and not 400. *sheepish grin*

(It was an arbitrary figure based on my mixer doing 8 channels at 10% CPU consumption... so I figured 100% consumption could do 80... take half that for a reasonable guess on what an unoptimized mixer might do. But I guess I got a little overexcited with that zero key...)

Dan.

#35011 - gbawiz - Sat Jan 29, 2005 10:55 am

I have added the neat little trick using the palette to indicate the duration of the vblanks as provided by DekuTree64.

here is what happens

num_channels_active----resulting red percentage
0---- 15% red
1---- 60% red
2---- 95% red (sounds bit dodgy now)
3---- red/black strobe effect (very clicky noize)
----------------------------------------------------------------------------
Here is a rundow of what happens when vblank occurs:

ISR
if vblank occurs then
set pallete to red
call mixer function
set pallete to black


-------------------------------------------------------
mixer function
______________
disable DMA flow
swap buffers
enable DMA flow
indicate which buffer now playing

if buffer 1 playing then mix buffer 2
else mix buffer 1

the process of mixing any buffer is:


scan all buffer 'n' locations 0 to buffersize
if channel 0 active then stepsample channel 0
if channel 1 active then stepsample channel 1
if channel 2 active then stepsample channel 2
if channel 3 active then stepsample channel 3
add all together and divide by 4
send result to buffer 'n' [current location]
next location of buffer 'n'

there are 304 buffer locations with the appropriate DMA timer frequencies set according

to the VBLANK duration:

Timer: 64612
mix frequency: 18157
Buffer sizes: 304
-----------------------------------------------------------
I can see that perhaps the mixing routine is taking too much time to execute.
Also, how do I store the ISR in IWRAM?

#35019 - Arjan - Sat Jan 29, 2005 1:48 pm

gbawiz wrote:

scan all buffer 'n' locations 0 to buffersize
if channel 0 active then stepsample channel 0
if channel 1 active then stepsample channel 1
if channel 2 active then stepsample channel 2
if channel 3 active then stepsample channel 3
add all together and divide by 4
send result to buffer 'n' [current location]
next location of buffer 'n'

Is this dividing done by using a real divide, or a bitshift?
_________________
dus.... http://www.bombaman.net

#35021 - gbawiz - Sat Jan 29, 2005 2:39 pm

Arjan wrote:
gbawiz wrote:

scan all buffer 'n' locations 0 to buffersize
if channel 0 active then stepsample channel 0
if channel 1 active then stepsample channel 1
if channel 2 active then stepsample channel 2
if channel 3 active then stepsample channel 3
add all together and divide by 4
send result to buffer 'n' [current location]
next location of buffer 'n'

Is this dividing done by using a real divide, or a bitshift?


I actually use a bitshift (>>2) to reduce the sample to within the 8bit window

I have been using another method whereby adding the samples together into a 16bit word then shifting >>2 before storing the result into the output buffer.
The main problem seems to be with the mixing stage but I cannot find the problem.

DekuTree64: I was wondering about the palette method, should the screen be totally black or is some red expected?
Also when the palette changes within the VBLANK interrupt, are these changes made visible straight away or does the GBA wait until the VBLANK duration has finished before changing the background colour?

Thanks

#35050 - DekuTree64 - Sat Jan 29, 2005 9:38 pm

gbawiz wrote:
DekuTree64: I was wondering about the palette method, should the screen be totally black or is some red expected?
Also when the palette changes within the VBLANK interrupt, are these changes made visible straight away or does the GBA wait until the VBLANK duration has finished before changing the background colour?

Nope, the palette won't be visible until the screen starts drawing again. That's why I put the while(REG_VCOUNT != 0) first, so you can see exactly where the red starts and then how many lines it takes.
There are 228 lines total (160 visible, 68 in VBlank), so you can see what percent of the total time it's using by linesUsed*100/228.

As for why it's so slow, loading all the channel pointer/volume/etc data for every channel on every sample is a lot of work to be done. My fastest mixer actually did work that way, but it did batches of 4 samples at a time, and kept them in registers while loading new channel data to add into them.
GCC probably isn't smart enough to make it fast even if you did do batches, so I'd suggest looping over your channels, and for each of them, loop over the whole 304 sample buffer, mixing the current channel into it.
You'll need a 16-bit temporary buffer, and because of having to load and store the samples in that temporary buffer so many times, it's a pretty slow algorithm all around. Still, it's simple enough for a compiler to optimize about as well as a human could, so it works pretty good in C.

Check out Tepples' TOD at www.pineight.com for a good mixer in this style.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#35222 - gbawiz - Tue Feb 01, 2005 6:58 pm

Hello,
I read before that the mixing routine could be stored in IWRAM to make execution faster, how can I store my mixing function in that area of memory?


Last edited by gbawiz on Fri Feb 18, 2005 5:26 pm; edited 2 times in total

#35232 - tepples - Tue Feb 01, 2005 8:51 pm

The easiest way to get code into IWRAM under recent devkitARM is to put your mixer function in a separate file, naming it 'whatever.iwram.c'. That would get compiled into 'whatever.iwram.o', and the link script would notice the .iwram.o suffix and have the startup code copy it into IWRAM.

Another thing to watch out for is that you need to use a special jump instruction to call code in ROM from RAM or vice versa. In the main code (not the separate file containing the mixer), put __attribute__((long_call)) on the function's prototype (not its implementation) in order to get GCC to generate a long jump instruction:
Code:
__attribute__((long_call)) int mixer(...);

_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#35274 - gbawiz - Wed Feb 02, 2005 12:13 pm

Hello,
I have managed to move the mixing routine into IWRAM and have now noticed an improvement.
I can squeeze 3 voices into the time allocated between VBLANKS but the fourth one being active causes problems.

I believe that there is still too much time occupied by the ISR on the audio procedures and I cannot see how I can build an entire game program when the audio is not functioning fast enough.
(i.e. when ISR is taken up with audio, is there enough room for animations and other game routines in ISR?)
Thanks

#36390 - gbawiz - Fri Feb 18, 2005 5:26 pm

storing in IWRAM
compile the mixer routine into an object file.
rename the object file to name.text.iwram.o