gbadev.org forum archive

With quite a bit of hacking I managed to get the much more optimized YM3812 sound chip emulator from ScummVM working on the arm7.
Unfortunately I have not been able to get any good results, the sound is generated fine but I'm not sure if the buffer isn't being filled fast enough or what.

Get a profiler. :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

I don't know of any arm7 profilers or I would, though I think it may have been a buffering error on my part.

The number of samples produced by the emulator can be more or less than what is needed to fill the buffer at the required time.
I'm still trying to wrap my head around this but I think I know what I need to do.

Set a timer, and look at the value when your subroutine finishes. Write that value somewhere in the IPC struct where the ARM9 can pick it up and use it. (You should probably be able to finish this profiling before WinterMute and friends change libnds to no longer use an IPC struct.)
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

I think I may have the buffering thing figured out but I'll post the relevant code here anyway to get more opinions.

Code:

int EmulateIMFTick( void ) {
int Samples = 0;

while ( SongSize ) {
   OPLWrite( Adlib, Song->r, Song->v );

   if ( Song->pause > 0 ) {
      Samples = ( Song->pause * OPLFreq ) / 700;
      YM3812UpdateOne( Adlib, SampleBuffer, Samples );

      Song++;
      SongSize-= sizeof( IMFCommand );

      return Samples;
   }

   Song++;
   SongSize-= sizeof( IMFCommand );
}

return 0;
}

void PlayTheTune( void ) {
int SamplesNeeded = 0;
int SamplesMade = 0;
int SamplesReady = 0;

SoundCursor = RingBuffer;
BufferPos = RingBuffer;

// Initial buffer fill
SamplesNeeded = ( 11025 * 10 );//512;
do {
   if ( SamplesReady ) {
      // We had some left over samples from the last
      // emulation tick.
      // Just pretend we ran through the emulator :D
      SamplesMade = SamplesReady;
   } else {
      SamplesMade = EmulateIMFTick( );
   }

   if ( SamplesMade == SamplesNeeded ) {
      // Exactly enough samples were generated.
      // Copy them directly into the ring buffer.
      memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );
      break;
   } else if ( SamplesMade > SamplesNeeded ) {
      // More samples were generated than needed.
      // Copy the needed amount of samples into the buffer and move
      // the unused samples to the start of the sample buffer.
      SamplesReady = ( SamplesMade - SamplesNeeded );

      memcpy( BufferPos, SampleBuffer, SamplesNeeded * 2 );
      memmove( SampleBuffer, &SampleBuffer[ SamplesReady ], SamplesReady * 2 );

      break;
   } else {
      // Not enough samples were generated.
      // Do another run through the loop.
      memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );

      BufferPos+= SamplesMade;
      SamplesNeeded-= SamplesMade;
   }
}
while ( SamplesNeeded );

SCHANNEL_SOURCE( 0 ) = RingBuffer;
SCHANNEL_LENGTH( 0 ) = RingBufferSize >> 2;
SCHANNEL_TIMER( 0 ) = SOUND_FREQ( OPLFreq );
SCHANNEL_CR( 0 ) = SCHANNEL_ENABLE | SOUND_ONE_SHOT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 0 );
}

Of course PlayTheTune() is not where the code is supposed to be but I wanted to make sure if the buffering is correct before moving ahead.
It does sound good though at 11025Hz, hopefully the arm7 is fast enough to do this in realtime.

Another bonus is if someone wants to add YM3812 emulation into the DS port of DOOM so we get music :D

Lazy1 wrote:

It does sound good though at 11025Hz, hopefully the arm7 is fast enough to do this in realtime.

Just a sidenote:
Actually the DS does not support 11025Hz samples. If you keep the casts in mind when setting the sample rate register you'll set a slightly different speed. You can't hear that in the pitch (at least i don't) but if you use timer to fill the buffer you need to keep that in mind otherwise you will get out of sync sooner or later.

Iirc it was around 11031Hz,to get the exact speed, just look at the actual value written to the sample rate register and do the reverse calculation.
_________________
Trying to bring more detail into understanding the wireless hardware

I'll keep that in mind when I get the buffering working properly, so far it "plays" but has short gaps and repeats which probably means my buffering code is broken.

I did make a few modifications and most of the streaming code is from memory when I did it with MP3.

MWHAHAHAHA!
It really works, god only knows why but it actually works.

A few more modifications are needed but this pretty much proves that you can emulate the YM3812 chip on the arm7.
Currently only full speed at 8000Hz though but for most purposes that should be fine.

Anybody want to start adding music to doom? :D

EDIT:
Too soon, tested another song and it's crackly.
Now I'll have to test the CPU usage.

Ok, so increasing the buffer size to 16KB has solved the crackling issue however there are still a few problems that make no sense at all.

Lots of magic numbers involved in this too, is it a necessary evil of DS audio programming or is it just luck why things aren't (horribly) going wrong?

Code:

int EmulateIMFTick( void ) {
int Samples = 0;

while ( SongSize ) {
   OPLWrite( Adlib, Song->r, Song->v );

   if ( Song->pause > 0 ) {
      Samples = ( Song->pause * OPLFreq ) / 700;
      YM3812UpdateOne( Adlib, SampleBuffer, Samples );

      Song++;
      SongSize-= sizeof( IMFCommand );

      return Samples;
   }

   Song++;
   SongSize-= sizeof( IMFCommand );
}

return 0;
}

short* SoundCursor = NULL;
short* BufferPos = NULL;

void SwapCursor( void ) {
if ( SoundCursor == RingBuffer ) SoundCursor+= ( ( RingBufferSize / 2 ) / 2 );
else SoundCursor = RingBuffer;
}

int SamplesReady = 0;

void BufferFill( int SamplesNeeded ) {
int SamplesMade = 0;

BufferPos = SoundCursor;

do {
   if ( SamplesReady ) {
      // We had some left over samples from the last
      // emulation tick.
      // Just pretend we ran through the emulator :D
      SamplesMade = SamplesReady;
      SamplesReady = 0;
   } else {
      SamplesMade = EmulateIMFTick( );
   }

   if ( SamplesMade == SamplesNeeded ) {
      // Exactly enough samples were generated.
      // Copy them directly into the ring buffer.
      memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );
      BufferPos+= SamplesMade;
      break;
   } else if ( SamplesMade > SamplesNeeded ) {
      // More samples were generated than needed.
      // Copy the needed amount of samples into the buffer and move
      // the unused samples to the start of the sample buffer.
      SamplesReady = ( SamplesMade - SamplesNeeded );

      memcpy( BufferPos, SampleBuffer, SamplesNeeded * 2 );
      memmove( SampleBuffer, &SampleBuffer[ SamplesNeeded ], SamplesReady * 2 );
      //memmove( SampleBuffer, &SampleBuffer[ SamplesReady ], SamplesReady * 2 );

      BufferPos+= SamplesNeeded;

      break;
   } else {
      // Not enough samples were generated.
      // Do another run through the loop.
      memcpy( BufferPos, SampleBuffer, SamplesMade * 2 );

      BufferPos+= SamplesMade;
      SamplesNeeded-= SamplesMade;
   }
}
while ( SamplesNeeded && SongSize );

SwapCursor( );
}

void WaitBufferFinished( void ) {
// TODO:
// Something should go here, but nothing has worked so far.
}

void PlayTheTune( void ) {
int LastTimer2Data = 0;
int Timer2Data = 0;
int i;

SoundCursor = RingBuffer;

// Fill the ring buffer completely before starting
BufferFill( RingBufferSize );

TIMER0_DATA = SOUND_FREQ( OPLFreq ) * 2;
TIMER1_DATA = 65536 - ( ( RingBufferSize / 2 ) / 2 );
TIMER2_DATA = 0;

TIMER0_CR = TIMER_ENABLE | TIMER_DIV_1;
TIMER1_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;
TIMER2_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;

SCHANNEL_SOURCE( 0 ) = ( u32 ) RingBuffer;
SCHANNEL_LENGTH( 0 ) = RingBufferSize >> 2;
SCHANNEL_TIMER( 0 ) = SOUND_FREQ( OPLFreq );
SCHANNEL_CR( 0 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 0 );

SCHANNEL_SOURCE( 1 ) = ( u32 ) RingBuffer;
SCHANNEL_LENGTH( 1 ) = RingBufferSize >> 2;
SCHANNEL_TIMER( 1 ) = SOUND_FREQ( OPLFreq );
SCHANNEL_CR( 1 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 127 );

WaitBufferFinished( );

while ( SongSize ) {
   // WTF?
   // All this searching and it took adding 1 to TIMER2_DATA
   // to fix the crackling?
   Timer2Data = TIMER2_DATA + 1;

   if ( Timer2Data > LastTimer2Data ) {
      BufferFill( ( RingBufferSize / 2 ) / 2 );
   }

   LastTimer2Data = Timer2Data;
}

// TODO:
// Wait until last part of buffer finishes playing.
WaitBufferFinished( );

SCHANNEL_CR( 0 ) = 0;
SCHANNEL_CR( 1 ) = 0;
TIMER0_CR = 0;
TIMER1_CR = 0;
TIMER2_CR = 0;
}

Once the music starts the first part of the buffer gets overwritten, that makes sense since it hasn't finished playing yet.
The problem being no matter how I try to wait for it nothing aligns and screws up the rest of the stream.

Other than the start screwing up and the end being cut off (WaitBufferFinished issue) the whole thing seems to work fine.
Even 11025Hz works now with the larger buffer, a huge improvement over 8000Hz at least for wolf3d music.

So when do ASM optimizations come into play?
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

Maybe when I get around to learning arm assembly, I found a nice ebook on it but unfortunately the text to speech option has been disabled on it.

It's fast enough to run at 11025Hz but the timing issue is still there and makes no sense.

Lazy1 wrote:

but unfortunately the text to speech option has been disabled on it.

Have you tried contacting its publisher?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

I was going to but got distracted and did something else instead.
Maybe in the next week or so I will.

Lazy1 wrote:

Lots of magic numbers involved in this too, is it a necessary evil of DS audio programming or is it just luck why things aren't (horribly) going wrong?

Nah, no magic or evil so far as I know. The main annoyance is needing separate stream timers in the first place... (You can actually get away with one timer, which is what I do, but two or three should work fine.)

I'm not sure about the beginning and the end... it depends on how you're doing it. If you only track TIMER2, it looks like there's no guarantee EmulateIMFTick() will really fill the whole thing. In that case, wouldn't you have to zero the rest of that half before waiting for it to finish?

Hm, I see BufferFill() swaps the cursor once at the end, apparently expecting to fill half in one go. But then you do this at the start:

Code:

SoundCursor = RingBuffer;
BufferFill( RingBufferSize ); // size in bytes? samples?

If that's in bytes (BufferFill clearly expects a sample count), you could be overrunning the buffer here. But more importantly, where does it put the cursor afterward?

Lazy1 wrote:

Even 11025Hz works now with the larger buffer, a huge improvement over 8000Hz at least for wolf3d music.

Yikes, 11Khz is ear-grating compared to the real thing! :|

Dwedit wrote:

So when do ASM optimizations come into play?

There's definitely room for that; but IMHO, a lot sorta boils down to loop invariants, and simplifying the hotspots as far as possible. The emulators I looked at grind through every oscillator for every sample, and I don't even want to know how much cache and register thrashing is involved! o_o

Anyhow, I've got some C++/assembler kicking around that can handle 32Khz stereo (admittedly incomplete, with some minor bugs and cheating involved). At that rate, 18 YM voices take around 66% of the ARM9, fast enough to run some decent MIDI through it. I'm sick of optimizing, so it's fallen aside, but if there's enough interest, I'll try to clean it up. >_>

You're right, there were a few bugs in there I missed.
Now the start plays properly but drifts out of alignment fairly quickly.

I'll have to upload the source+binary so it can be heard, 11KHz is not _that_ bad.

Code:

void WaitBufferFinished( void ) {
int Future = TIMER2_DATA + 1;

while ( TIMER2_DATA < Future )
;
}

void PlayTheTune( void ) {
int LastTimer2Data = 0;
int Timer2Data = 0;
int i;

SoundCursor = RingBuffer;

// Fill the ring buffer completely before starting
BufferFill( ( RingBufferSize / 2 ) / 2 );

TIMER0_DATA = SOUND_FREQ( OPLFreq ) * 2;
TIMER1_DATA = 65536 - ( ( RingBufferSize / 2 ) / 2 );
TIMER2_DATA = 0;

TIMER0_CR = TIMER_ENABLE | TIMER_DIV_1;
TIMER1_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;
TIMER2_CR = TIMER_ENABLE | TIMER_DIV_1 | TIMER_CASCADE;

SCHANNEL_SOURCE( 0 ) = ( u32 ) RingBuffer;
SCHANNEL_LENGTH( 0 ) = RingBufferSize >> 2;
SCHANNEL_TIMER( 0 ) = SOUND_FREQ( OPLFreq );
SCHANNEL_CR( 0 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 0 );

SCHANNEL_SOURCE( 1 ) = ( u32 ) RingBuffer;
SCHANNEL_LENGTH( 1 ) = RingBufferSize >> 2;
SCHANNEL_TIMER( 1 ) = SOUND_FREQ( OPLFreq );
SCHANNEL_CR( 1 ) = SCHANNEL_ENABLE | SOUND_REPEAT | SOUND_16BIT | SOUND_VOL( 0x7F ) | SOUND_PAN( 127 );

//WaitBufferFinished( );

while ( SongSize ) {
WaitBufferFinished( );
BufferFill( ( RingBufferSize / 2 ) / 2 );
}

// TODO:
// Wait until last part of buffer finishes playing.
WaitBufferFinished( );

SCHANNEL_CR( 0 ) = 0;
SCHANNEL_CR( 1 ) = 0;
TIMER0_CR = 0;
TIMER1_CR = 0;
TIMER2_CR = 0;
}

EDIT:
One downside is that currently it requires a vram bank mapped to the arm7.

EDIT 2:
I would be interested to hear what hacks I can do to speed it up though, now I see that some songs just won't run at 11KHz :/

Cydrak wrote:

The main annoyance is needing separate stream timers in the first place... (You can actually get away with one timer, which is what I do, but two or three should work fine.)

How would you stream using only one timer?
I think it's time to re-write the streaming code and try to get some idea as to how long it's taking.

I could count hblanks before and after buffer fills but that would not be very accurate since a different amount of samples can be generated each time.

Lazy1 wrote:

How would you stream using only one timer?

Manage them with a list.
If you add an event (the previous timer IRQ) to the list check if the timer is allready running:

running: is current time left in timer > new timer delay
- yes: reduce the saved delay in the first element to current time left in timer - new timer delay, push_front the new event with the new timer delay
- no: iterate through the list until the accumulated delay times are > new timer delay. Insert just before it the new event with new timer delay - accumulated delay just before, and reduce the following delay by the new timer delay
not running: obvious

If then the timer irq fires, do the first event in the list and remove it, set the new timer with the delay with the next event. (Beware of the 0 delay)

All timer events are queued up by their chronological order, seperated by the time passed between them with only one timer.

PS: You'll need to take care about timer progress while adding a new event. Usually you can just stop the timer for short and start it again after you are done. If you do this, set the timer data again by the value you have just read, otherwise it's data will reset to the last written value on enable (see GBATEK)
_________________
Trying to bring more detail into understanding the wireless hardware

Lazy1 wrote:

How would you stream using only one timer?

Maxxie wrote:

Manage them with a list.

Hmm, I'm not sure I understand your suggestion, since the timers above are not separate timers; they're linked (cascaded) as a large one. (Although I'm curious how you avoid drift while adjusting the timer delay...)

To be sure, the way Lazy1's got it set up:
- TIMER0 counts right along with the sound channel's..
- TIMER1 counts a sample every time TIMER0 overflows..
- TIMER2 counts halves of the ring buffer every time TIMER1 overflows. (The same could have been done in TIMER1's IRQ instead.)

Let's say I try:

Code:

playCursor = 0;
TIMER0_DATA = 2*SOUND_FREQ( sampleRate );
...
SCHANNEL_LENGTH(0) = streamSize * sampleBits/32;
SCHANNEL_CR(0) = SCHANNEL_ENABLE | ...;
TIMER0_CR = TIMER_ENABLE | TIMER_IRQ_REQ | TIMER_DIV_1;
...

void streamTimerHr() {
playCursor++;
if(playCursor >= streamSize)
playCursor -= streamSize;
}

This fires an IRQ every sample, sort of a waste, really.

But the thing is, you don't need a super-accurate playback position. You just need to know that it will be between X and X+B, where B is some forward bound in the ring buffer. Then, so long as you keep up, you can refill it up to X-1 whenever you like.

So instead, I do this:

Code:

playCursor = 0;
samplePeriod = 2*SOUND_FREQ( sampleRate );
timerMultiple = -0x10000 / samplePeriod;
timerPeriod = timerMultiple * samplePeriod;
TIMER0_DATA = timerPeriod;
...
SCHANNEL_LENGTH(0) = streamSize * sampleBits/32;
SCHANNEL_CR(0) = SCHANNEL_ENABLE | ...;
TIMER0_CR = TIMER_ENABLE | TIMER_IRQ_REQ | TIMER_DIV_1;
...

void streamTimerHr() {
playCursor += timerMultiple;
if(playCursor >= streamSize)
playCursor -= streamSize;
}

Now if I want to stream at 11025hz or whatever, I'll have:
- samplePeriod = 2 * (-2^24 / 11025) = -3042
- timerMultiple = -0x10000 / -3042 = 21
- timerPeriod = 21 * -3042 = -63882

What I did was pick the largest multiple of samples that fits in the 16-bit timer period, to get a lowish frequency (>= 512hz). Since the timers and audio derive from the same clock, I still know--within the bounds of the multiple--what samples are being played (which is to say: where in the buffer I should stop writing).

In this case the timer runs around 525hz and the audio will be ~11030hz, or something close to that.* To give some perspective, that's a little over 8 IRQs/frame, which seems reasonable. (Hblank IRQs happen around 30 times that rate and I haven't had problems.)

* GBAtek and libnds slightly disagree, nevermind physical variations.

Cydrak wrote:

Lazy1 wrote:

How would you stream using only one timer?

Maxxie wrote:

Manage them with a list.

Hmm, I'm not sure I understand your suggestion, since the timers above are not separate timers; they're linked (cascaded) as a large one. (Although I'm curious how you avoid drift while adjusting the timer delay...)

Not the physical timers, just the timer events.

Just lie them down in a list like pearls on a chain.
Set the timer only for the first timer event in the chain and handle it. If done, set it for the next timer event, while keeping in mind the time that did allready pass since the init of that event ....

One timer active at any time and all events handled.
_________________
Trying to bring more detail into understanding the wireless hardware

Consider finding the free version of No$. It has decent timers built in for per cycle profiling.

I did a cheap hack profile sort of thing by seeing how many hblanks pass during a call to EmulateIMFTick().
Apparently the longest it took was 3299 hblanks to generate 3039 samples, maybe someone has the math to figure that one out?

EDIT: At 11025Hz

At 15.7343kHz H-timing, You just spent ~0.21 seconds for 3039 samples = ~0.27 seconds of audio (at 11025Hz)

-> should be enought, BUT

You are spending around 2312 system cycles (33Mhz) per sample which is a LOT for 8? 16? 32? bist.
_________________
Trying to bring more detail into understanding the wireless hardware

Did you disable interrupts while counting your lines? You may very well be getting VBlank time in with your measurements.

Hmm, disabling all but the hblank interrupt really changes the numbers...

2535 Samples took 2748 hblanks.

OK, So it appears the ring buffer size must be a multiple of the sample rate?
Everything works perfectly now, thanks for all the help :D

gbadev.org forum archive

DS development > How can I test if sound can be decoded fast enough?

#163251 - Lazy1 - Sat Sep 27, 2008 9:05 am

#163254 - silent_code - Sat Sep 27, 2008 11:34 am

#163256 - Lazy1 - Sat Sep 27, 2008 1:00 pm

#163258 - tepples - Sat Sep 27, 2008 1:16 pm

#163263 - Lazy1 - Sat Sep 27, 2008 6:27 pm

#163264 - Maxxie - Sat Sep 27, 2008 6:50 pm

#163267 - Lazy1 - Sat Sep 27, 2008 8:23 pm

#163274 - Lazy1 - Sun Sep 28, 2008 9:51 am

#163282 - Dwedit - Sun Sep 28, 2008 6:36 pm

#163284 - Lazy1 - Sun Sep 28, 2008 6:47 pm

#163285 - tepples - Sun Sep 28, 2008 6:50 pm

#163286 - Lazy1 - Sun Sep 28, 2008 6:55 pm

#163288 - Cydrak - Sun Sep 28, 2008 8:34 pm

#163291 - Lazy1 - Sun Sep 28, 2008 10:05 pm

#163363 - Lazy1 - Tue Sep 30, 2008 7:33 pm

#163369 - Maxxie - Tue Sep 30, 2008 9:08 pm

#163377 - Cydrak - Wed Oct 01, 2008 1:23 am

#163384 - Maxxie - Wed Oct 01, 2008 6:19 am

#163396 - Miked0801 - Wed Oct 01, 2008 5:27 pm

#163443 - Lazy1 - Thu Oct 02, 2008 2:28 pm

#163445 - Maxxie - Thu Oct 02, 2008 3:07 pm

#163456 - Miked0801 - Thu Oct 02, 2008 6:31 pm

#163458 - Lazy1 - Thu Oct 02, 2008 7:20 pm

#163496 - Lazy1 - Fri Oct 03, 2008 5:29 pm