gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > How can I test if sound can be decoded fast enough?

#163251 - Lazy1 - Sat Sep 27, 2008 9:05 am

With quite a bit of hacking I managed to get the much more optimized YM3812 sound chip emulator from ScummVM working on the arm7.
Unfortunately I have not been able to get any good results, the sound is generated fine but I'm not sure if the buffer isn't being filled fast enough or what.

#163254 - silent_code - Sat Sep 27, 2008 11:34 am

Get a profiler. :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

#163256 - Lazy1 - Sat Sep 27, 2008 1:00 pm

I don't know of any arm7 profilers or I would, though I think it may have been a buffering error on my part.

The number of samples produced by the emulator can be more or less than what is needed to fill the buffer at the required time.
I'm still trying to wrap my head around this but I think I know what I need to do.

#163258 - tepples - Sat Sep 27, 2008 1:16 pm

Set a timer, and look at the value when your subroutine finishes. Write that value somewhere in the IPC struct where the ARM9 can pick it up and use it. (You should probably be able to finish this profiling before WinterMute and friends change libnds to no longer use an IPC struct.)
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#163263 - Lazy1 - Sat Sep 27, 2008 6:27 pm

I think I may have the buffering thing figured out but I'll post the relevant code here anyway to get more opinions.

Code:

int EmulateIMFTick( void ) {
   int Samples = 0;

   while ( SongSize ) {
      OPLWrite( Adlib, Song->r, Song->v );

      if ( Song->pause > 0 ) {
         Samples = ( Song->pause * OPLFreq ) / 700;
         YM3812UpdateOne( Adlib, SampleBuffer, Samples );

         Song++;
         SongSize-= sizeof( IMFCommand );

         return Samples;
      }

      Song++;
      SongSize-= sizeof( IMFCommand );
   }

   return 0;
}

void PlayTheTune( void ) {
   int SamplesNeeded = 0;
   int SamplesMade = 0;
   int SamplesReady = 0;

   SoundCursor = RingBuffer;
   BufferPos = RingBuffer;

   // Initial buffer fill
   SamplesNeeded = ( 11025 * 10 );//512;
   do {
      if ( SamplesReady ) {
         // We had some left over samples from the last
         // emulation tick.
         // Just pretend we ran through the emulator :D
         SamplesMade = SamplesReady;
      } else {
         SamplesMade = EmulateIMFTick( );
      }

      if ( SamplesMade == SamplesNeeded ) {
         // Exactly enough samples were generated.
         // Copy them directly into the ring buffer.
         memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );
         break;
      } else if ( SamplesMade > SamplesNeeded ) {
         // More samples were generated than needed.
         // Copy the needed amount of samples into the buffer and move
         // the unused samples to the start of the sample buffer.
         SamplesReady = ( SamplesMade - SamplesNeeded );

         memcpy( BufferPos, SampleBuffer, SamplesNeeded * 2 );
         memmove( SampleBuffer, &SampleBuffer[ SamplesReady ], SamplesReady * 2 );

         break;
      } else {
         // Not enough samples were generated.
         // Do another run through the loop.
         memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );

         BufferPos+= SamplesMade;
         SamplesNeeded-= SamplesMade;
      }
   }
   while ( SamplesNeeded );

   SCHANNEL_SOURCE( 0 ) = RingBuffer;
   SCHANNEL_LENGTH( 0 ) = RingBufferSize >> 2;
   SCHANNEL_TIMER( 0 ) = SOUND_FREQ( OPLFreq );
   SCHANNEL_CR( 0 ) = SCHANNEL_ENABLE | SOUND_ONE_SHOT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 0 );
}


Of course PlayTheTune() is not where the code is supposed to be but I wanted to make sure if the buffering is correct before moving ahead.
It does sound good though at 11025Hz, hopefully the arm7 is fast enough to do this in realtime.

Another bonus is if someone wants to add YM3812 emulation into the DS port of DOOM so we get music :D

#163264 - Maxxie - Sat Sep 27, 2008 6:50 pm

Lazy1 wrote:

It does sound good though at 11025Hz, hopefully the arm7 is fast enough to do this in realtime.


Just a sidenote:
Actually the DS does not support 11025Hz samples. If you keep the casts in mind when setting the sample rate register you'll set a slightly different speed. You can't hear that in the pitch (at least i don't) but if you use timer to fill the buffer you need to keep that in mind otherwise you will get out of sync sooner or later.

Iirc it was around 11031Hz,to get the exact speed, just look at the actual value written to the sample rate register and do the reverse calculation.
_________________
Trying to bring more detail into understanding the wireless hardware

#163267 - Lazy1 - Sat Sep 27, 2008 8:23 pm

I'll keep that in mind when I get the buffering working properly, so far it "plays" but has short gaps and repeats which probably means my buffering code is broken.

I did make a few modifications and most of the streaming code is from memory when I did it with MP3.

MWHAHAHAHA!
It really works, god only knows why but it actually works.

A few more modifications are needed but this pretty much proves that you can emulate the YM3812 chip on the arm7.
Currently only full speed at 8000Hz though but for most purposes that should be fine.

Anybody want to start adding music to doom? :D

EDIT:
Too soon, tested another song and it's crackly.
Now I'll have to test the CPU usage.

#163274 - Lazy1 - Sun Sep 28, 2008 9:51 am

Ok, so increasing the buffer size to 16KB has solved the crackling issue however there are still a few problems that make no sense at all.

Lots of magic numbers involved in this too, is it a necessary evil of DS audio programming or is it just luck why things aren't (horribly) going wrong?

Code:

int EmulateIMFTick( void ) {
   int Samples = 0;

   while ( SongSize ) {
      OPLWrite( Adlib, Song->r, Song->v );

      if ( Song->pause > 0 ) {
         Samples = ( Song->pause * OPLFreq ) / 700;
         YM3812UpdateOne( Adlib, SampleBuffer, Samples );

         Song++;
         SongSize-= sizeof( IMFCommand );

         return Samples;
      }

      Song++;
      SongSize-= sizeof( IMFCommand );
   }

   return 0;
}

short* SoundCursor = NULL;
short* BufferPos = NULL;

void SwapCursor( void ) {
   if ( SoundCursor == RingBuffer ) SoundCursor+= ( ( RingBufferSize / 2 ) / 2 );
   else SoundCursor = RingBuffer;
}

int SamplesReady = 0;

void BufferFill( int SamplesNeeded ) {
   int SamplesMade = 0;

   BufferPos = SoundCursor;

   do {
      if ( SamplesReady ) {
         // We had some left over samples from the last
         // emulation tick.
         // Just pretend we ran through the emulator :D
         SamplesMade = SamplesReady;
         SamplesReady = 0;
      } else {
         SamplesMade = EmulateIMFTick( );
      }

      if ( SamplesMade == SamplesNeeded ) {
         // Exactly enough samples were generated.
         // Copy them directly into the ring buffer.
         memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );
         BufferPos+= SamplesMade;
         break;
      } else if ( SamplesMade > SamplesNeeded ) {
         // More samples were generated than needed.
         // Copy the needed amount of samples into the buffer and move
         // the unused samples to the start of the sample buffer.
         SamplesReady = ( SamplesMade - SamplesNeeded );

         memcpy( BufferPos, SampleBuffer, SamplesNeeded * 2 );
         memmove( SampleBuffer, &SampleBuffer[ SamplesNeeded ], SamplesReady * 2 );
         //memmove( SampleBuffer, &SampleBuffer[ SamplesReady ], SamplesReady * 2 );

         BufferPos+= SamplesNeeded;

         break;
      } else {
         // Not enough samples were generated.
         // Do another run through the loop.
         memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );

         BufferPos+= SamplesMade;
         SamplesNeeded-= SamplesMade;
      }
   }
   while ( SamplesNeeded && SongSize );

   SwapCursor( );
}

void WaitBufferFinished( void ) {
   // TODO:
   // Something should go here, but nothing has worked so far.
}

void PlayTheTune( void ) {
   int LastTimer2Data = 0;
   int Timer2Data = 0;
   int i;

   SoundCursor = RingBuffer;

   // Fill the ring buffer completely before starting
   BufferFill( RingBufferSize );

   TIMER0_DATA = SOUND_FREQ( OPLFreq ) * 2;
   TIMER1_DATA = 65536 - ( ( RingBufferSize / 2 ) / 2 );
   TIMER2_DATA = 0;

   TIMER0_CR = TIMER_ENABLE | TIMER_DIV_1;
   TIMER1_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;
   TIMER2_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;

   SCHANNEL_SOURCE( 0 ) = ( u32 ) RingBuffer;
   SCHANNEL_LENGTH( 0 ) = RingBufferSize >> 2;
   SCHANNEL_TIMER( 0 ) = SOUND_FREQ( OPLFreq );
   SCHANNEL_CR( 0 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 0 );

   SCHANNEL_SOURCE( 1 ) = ( u32 ) RingBuffer;
   SCHANNEL_LENGTH( 1 ) = RingBufferSize >> 2;
   SCHANNEL_TIMER( 1 ) = SOUND_FREQ( OPLFreq );
   SCHANNEL_CR( 1 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 127 );

   WaitBufferFinished( );

   while ( SongSize ) {
      // WTF?
      // All this searching and it took adding 1 to TIMER2_DATA
      // to fix the crackling?
      Timer2Data = TIMER2_DATA + 1;

      if ( Timer2Data > LastTimer2Data ) {
         BufferFill( ( RingBufferSize / 2 ) / 2 );
      }

      LastTimer2Data = Timer2Data;
   }

   // TODO:
   // Wait until last part of buffer finishes playing.
   WaitBufferFinished( );

   SCHANNEL_CR( 0 ) = 0;
   SCHANNEL_CR( 1 ) = 0;
   TIMER0_CR = 0;
   TIMER1_CR = 0;
   TIMER2_CR = 0;
}


Once the music starts the first part of the buffer gets overwritten, that makes sense since it hasn't finished playing yet.
The problem being no matter how I try to wait for it nothing aligns and screws up the rest of the stream.

Other than the start screwing up and the end being cut off (WaitBufferFinished issue) the whole thing seems to work fine.
Even 11025Hz works now with the larger buffer, a huge improvement over 8000Hz at least for wolf3d music.

#163282 - Dwedit - Sun Sep 28, 2008 6:36 pm

So when do ASM optimizations come into play?
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#163284 - Lazy1 - Sun Sep 28, 2008 6:47 pm

Maybe when I get around to learning arm assembly, I found a nice ebook on it but unfortunately the text to speech option has been disabled on it.

It's fast enough to run at 11025Hz but the timing issue is still there and makes no sense.

#163285 - tepples - Sun Sep 28, 2008 6:50 pm

Lazy1 wrote:
but unfortunately the text to speech option has been disabled on it.

Have you tried contacting its publisher?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#163286 - Lazy1 - Sun Sep 28, 2008 6:55 pm

I was going to but got distracted and did something else instead.
Maybe in the next week or so I will.

#163288 - Cydrak - Sun Sep 28, 2008 8:34 pm

Lazy1 wrote:
Lots of magic numbers involved in this too, is it a necessary evil of DS audio programming or is it just luck why things aren't (horribly) going wrong?

Nah, no magic or evil so far as I know. The main annoyance is needing separate stream timers in the first place... (You can actually get away with one timer, which is what I do, but two or three should work fine.)

I'm not sure about the beginning and the end... it depends on how you're doing it. If you only track TIMER2, it looks like there's no guarantee EmulateIMFTick() will really fill the whole thing. In that case, wouldn't you have to zero the rest of that half before waiting for it to finish?

Hm, I see BufferFill() swaps the cursor once at the end, apparently expecting to fill half in one go. But then you do this at the start:
Code:
   SoundCursor = RingBuffer;
   BufferFill( RingBufferSize );   // size in bytes? samples?

If that's in bytes (BufferFill clearly expects a sample count), you could be overrunning the buffer here. But more importantly, where does it put the cursor afterward?

Lazy1 wrote:
Even 11025Hz works now with the larger buffer, a huge improvement over 8000Hz at least for wolf3d music.

Yikes, 11Khz is ear-grating compared to the real thing! :|

Dwedit wrote:
So when do ASM optimizations come into play?

There's definitely room for that; but IMHO, a lot sorta boils down to loop invariants, and simplifying the hotspots as far as possible. The emulators I looked at grind through every oscillator for every sample, and I don't even want to know how much cache and register thrashing is involved! o_o

Anyhow, I've got some C++/assembler kicking around that can handle 32Khz stereo (admittedly incomplete, with some minor bugs and cheating involved). At that rate, 18 YM voices take around 66% of the ARM9, fast enough to run some decent MIDI through it. I'm sick of optimizing, so it's fallen aside, but if there's enough interest, I'll try to clean it up. >_>

#163291 - Lazy1 - Sun Sep 28, 2008 10:05 pm

You're right, there were a few bugs in there I missed.
Now the start plays properly but drifts out of alignment fairly quickly.

I'll have to upload the source+binary so it can be heard, 11KHz is not _that_ bad.

Code:

void WaitBufferFinished( void ) {
   int Future = TIMER2_DATA + 1;

   while ( TIMER2_DATA < Future )
   ;
}

void PlayTheTune( void ) {
   int LastTimer2Data = 0;
   int Timer2Data = 0;
   int i;

   SoundCursor = RingBuffer;

   // Fill the ring buffer completely before starting
   BufferFill( ( RingBufferSize / 2 ) / 2 );

   TIMER0_DATA = SOUND_FREQ( OPLFreq ) * 2;
   TIMER1_DATA = 65536 - ( ( RingBufferSize / 2 ) / 2 );
   TIMER2_DATA = 0;

   TIMER0_CR = TIMER_ENABLE | TIMER_DIV_1;
   TIMER1_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;
   TIMER2_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;

   SCHANNEL_SOURCE( 0 ) = ( u32 ) RingBuffer;
   SCHANNEL_LENGTH( 0 ) = RingBufferSize >> 2;
   SCHANNEL_TIMER( 0 ) = SOUND_FREQ( OPLFreq );
   SCHANNEL_CR( 0 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 0 );

   SCHANNEL_SOURCE( 1 ) = ( u32 ) RingBuffer;
   SCHANNEL_LENGTH( 1 ) = RingBufferSize >> 2;
   SCHANNEL_TIMER( 1 ) = SOUND_FREQ( OPLFreq );
   SCHANNEL_CR( 1 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 127 );

   //WaitBufferFinished( );

   while ( SongSize ) {
      WaitBufferFinished( );
      BufferFill( ( RingBufferSize / 2 ) / 2 );
   }

   // TODO:
   // Wait until last part of buffer finishes playing.
   WaitBufferFinished( );

   SCHANNEL_CR( 0 ) = 0;
   SCHANNEL_CR( 1 ) = 0;
   TIMER0_CR = 0;
   TIMER1_CR = 0;
   TIMER2_CR = 0;
}


EDIT:
One downside is that currently it requires a vram bank mapped to the arm7.

EDIT 2:
I would be interested to hear what hacks I can do to speed it up though, now I see that some songs just won't run at 11KHz :/

#163363 - Lazy1 - Tue Sep 30, 2008 7:33 pm

Cydrak wrote:
The main annoyance is needing separate stream timers in the first place... (You can actually get away with one timer, which is what I do, but two or three should work fine.)


How would you stream using only one timer?
I think it's time to re-write the streaming code and try to get some idea as to how long it's taking.

I could count hblanks before and after buffer fills but that would not be very accurate since a different amount of samples can be generated each time.

#163369 - Maxxie - Tue Sep 30, 2008 9:08 pm

Lazy1 wrote:

How would you stream using only one timer?


Manage them with a list.
If you add an event (the previous timer IRQ) to the list check if the timer is allready running:

  • running: is current time left in timer > new timer delay

    • yes: reduce the saved delay in the first element to current time left in timer - new timer delay, push_front the new event with the new timer delay
    • no: iterate through the list until the accumulated delay times are > new timer delay. Insert just before it the new event with new timer delay - accumulated delay just before, and reduce the following delay by the new timer delay

  • not running: obvious


If then the timer irq fires, do the first event in the list and remove it, set the new timer with the delay with the next event. (Beware of the 0 delay)

All timer events are queued up by their chronological order, seperated by the time passed between them with only one timer.

PS: You'll need to take care about timer progress while adding a new event. Usually you can just stop the timer for short and start it again after you are done. If you do this, set the timer data again by the value you have just read, otherwise it's data will reset to the last written value on enable (see GBATEK)
_________________
Trying to bring more detail into understanding the wireless hardware

#163377 - Cydrak - Wed Oct 01, 2008 1:23 am

Lazy1 wrote:
How would you stream using only one timer?

Maxxie wrote:
Manage them with a list.

Hmm, I'm not sure I understand your suggestion, since the timers above are not separate timers; they're linked (cascaded) as a large one. (Although I'm curious how you avoid drift while adjusting the timer delay...)

To be sure, the way Lazy1's got it set up:
- TIMER0 counts right along with the sound channel's..
- TIMER1 counts a sample every time TIMER0 overflows..
- TIMER2 counts halves of the ring buffer every time TIMER1 overflows. (The same could have been done in TIMER1's IRQ instead.)

Let's say I try:
Code:
playCursor         = 0;
TIMER0_DATA        = 2*SOUND_FREQ( sampleRate );
...
SCHANNEL_LENGTH(0) = streamSize * sampleBits/32;
SCHANNEL_CR(0)     = SCHANNEL_ENABLE | ...;
TIMER0_CR          = TIMER_ENABLE | TIMER_IRQ_REQ | TIMER_DIV_1;
...

void streamTimerHr() {
    playCursor++;
    if(playCursor >= streamSize)
        playCursor -= streamSize;
}

This fires an IRQ every sample, sort of a waste, really.

But the thing is, you don't need a super-accurate playback position. You just need to know that it will be between X and X+B, where B is some forward bound in the ring buffer. Then, so long as you keep up, you can refill it up to X-1 whenever you like.

So instead, I do this:
Code:
playCursor         = 0;
samplePeriod       = 2*SOUND_FREQ( sampleRate );
timerMultiple      = -0x10000 / samplePeriod;
timerPeriod        = timerMultiple * samplePeriod;
TIMER0_DATA        = timerPeriod;
...
SCHANNEL_LENGTH(0) = streamSize * sampleBits/32;
SCHANNEL_CR(0)     = SCHANNEL_ENABLE | ...;
TIMER0_CR          = TIMER_ENABLE | TIMER_IRQ_REQ | TIMER_DIV_1;
...

void streamTimerHr() {
    playCursor += timerMultiple;
    if(playCursor >= streamSize)
        playCursor -= streamSize;
}

Now if I want to stream at 11025hz or whatever, I'll have:
- samplePeriod = 2 * (-2^24 / 11025) = -3042
- timerMultiple = -0x10000 / -3042 = 21
- timerPeriod = 21 * -3042 = -63882

What I did was pick the largest multiple of samples that fits in the 16-bit timer period, to get a lowish frequency (>= 512hz). Since the timers and audio derive from the same clock, I still know--within the bounds of the multiple--what samples are being played (which is to say: where in the buffer I should stop writing).

In this case the timer runs around 525hz and the audio will be ~11030hz, or something close to that.* To give some perspective, that's a little over 8 IRQs/frame, which seems reasonable. (Hblank IRQs happen around 30 times that rate and I haven't had problems.)

* GBAtek and libnds slightly disagree, nevermind physical variations.

#163384 - Maxxie - Wed Oct 01, 2008 6:19 am

Cydrak wrote:
Lazy1 wrote:
How would you stream using only one timer?

Maxxie wrote:
Manage them with a list.

Hmm, I'm not sure I understand your suggestion, since the timers above are not separate timers; they're linked (cascaded) as a large one. (Although I'm curious how you avoid drift while adjusting the timer delay...)


Not the physical timers, just the timer events.

Just lie them down in a list like pearls on a chain.
Set the timer only for the first timer event in the chain and handle it. If done, set it for the next timer event, while keeping in mind the time that did allready pass since the init of that event ....

One timer active at any time and all events handled.
_________________
Trying to bring more detail into understanding the wireless hardware

#163396 - Miked0801 - Wed Oct 01, 2008 5:27 pm

Consider finding the free version of No$. It has decent timers built in for per cycle profiling.

#163443 - Lazy1 - Thu Oct 02, 2008 2:28 pm

I did a cheap hack profile sort of thing by seeing how many hblanks pass during a call to EmulateIMFTick().
Apparently the longest it took was 3299 hblanks to generate 3039 samples, maybe someone has the math to figure that one out?

EDIT: At 11025Hz

#163445 - Maxxie - Thu Oct 02, 2008 3:07 pm

At 15.7343kHz H-timing, You just spent ~0.21 seconds for 3039 samples = ~0.27 seconds of audio (at 11025Hz)

-> should be enought, BUT

You are spending around 2312 system cycles (33Mhz) per sample which is a LOT for 8? 16? 32? bist.
_________________
Trying to bring more detail into understanding the wireless hardware

#163456 - Miked0801 - Thu Oct 02, 2008 6:31 pm

Did you disable interrupts while counting your lines? You may very well be getting VBlank time in with your measurements.

#163458 - Lazy1 - Thu Oct 02, 2008 7:20 pm

Hmm, disabling all but the hblank interrupt really changes the numbers...

2535 Samples took 2748 hblanks.

#163496 - Lazy1 - Fri Oct 03, 2008 5:29 pm

OK, So it appears the ring buffer size must be a multiple of the sample rate?
Everything works perfectly now, thanks for all the help :D