gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Ugh, DMA set-up time bizarreness

#161597 - simonjhall - Wed Aug 06, 2008 10:55 pm

I think I'm missing something here!
I'm using
Code:
static inline void dmaCopyHalfWordsAsynch(uint8 channel, const void* src, void* dest, uint32 size) {
   DMA_SRC(channel) = (uint32)src;
   DMA_DEST(channel) = (uint32)dest;
   DMA_CR(channel) = DMA_COPY_HALFWORDS | (size>>1);
}
as my DMA function. As far as I can see, there's no blocking going on here. However here's what I'm doing:
Code:
printf("before %d\n", *(volatile unsigned short *)0x4000006);
   dmaCopyHalfWordsAsynch(0, (void *)0x2750000, (void *)0x2770000, 5 * 1024);
   dmaCopyHalfWordsAsynch(1, (void *)0x2750000, (void *)0x2770000, 5 * 1024);
   dmaCopyHalfWordsAsynch(2, (void *)0x2750000, (void *)0x2770000, 5 * 1024);
   dmaCopyHalfWordsAsynch(3, (void *)0x2750000, (void *)0x2770000, 5 * 1024);

   printf("after %d\n", *(volatile unsigned short *)0x4000006);
I don't care what the data is doing, I don't care that they overlap, and I only vaguely care about the size.

That magic number I'm print before and after is the vcount register. Only the vertical blank interrupt is on. The ARM7 is idle.

If I comment out DMAs #2, 3 & 4 my vcount advances by one, my vblank count doesn't increase. This is what I'd expect, as what I'm measuring is the set-up time of a DMA. If I uncomment DMA #2 the time between the two gaps is still 0 vblanks and the vcount has increased to 152. That's quite a long time! If I make it so that they don't overlap I get the exact same result.

If I change the first function call to be to dmaCopyHalfWords (which blocks at the end) the amount of time is the same!

If I change it back to the asynchronous version and uncomment the remaining two DMAs (so all four are apparently running asynchronously) the vcount increases to 190. If I make the first three synchronous and the final one asynchronous, vcount ends up at 191.

I've tried sticking other timing mechanisms around this code (eg timers) and I can verify this result...it looks like as soon as you try and schedule a DMA (whether it's asynchronous or not) but another one is already running it'll block when it's issued. It will literally wait for any pending DMA transfers to finish before a new one can be issued! I think the DMA system has a one-element deep queue and if you try and write another one, it'll block the CPU.

Can anyone else please verify this?
_________________
Big thanks to everyone who donated for Quake2

#161599 - HyperHacker - Wed Aug 06, 2008 11:27 pm

I did pretty much the same thing, and when I went from one asynchronous DMA transfer to four, the operation took about 1/4 as long to complete, just as you'd expect. Maybe the issue here is all four are trying to access the same memory?
IIRC, I was copying from RAM to VRAM, maybe RAM to RAM doesn't work as well.
_________________
I'm a PSP hacker now, but I still <3 DS.

#161600 - elhobbs - Thu Aug 07, 2008 12:07 am

I think this has already been confirmed form main memory to main memory dma copies in this thread
http://forum.gbadev.org/viewtopic.php?t=13242&postdays=0&postorder=asc&highlight=dma+vram+speed&start=15

#161609 - ritz - Thu Aug 07, 2008 5:03 am

So basically the problem is that the CPU halts after the first set of DMA_* (because it's DMA'ing now) preventing you from running the next few instructions that load the data into the last 3 DMA_* ports.

Quote:
The CPU can be kept running during DMA, provided that it is accessing only TCM (or cached memory), otherwise the CPU is halted until DMA finishes.
Respectively, interrupts executed during DMA will usually halt the CPU (unless the IRQ handler uses only TCM and cache; the IRQ vector at FFFF00xxh must be cached, or relocated to ITCM at 000000xxh, and the IRQ handler may not access IE, IF, or other I/O ports).

So, one way to use all 4 channels at the exact time, one would need to set all 4 of the DMA*FILL ports with addresses ahead of time and then set the execution time:

Code:
  0  Start Immediately
  1  Start at V-Blank
  2  Start at H-Blank (paused during V-Blank)
  3  Synchronize to start of display
  4  Main memory display
  5  DS Cartridge Slot
  6  GBA Cartridge Slot
  7  Geometry Command FIFO

Does this make sense? Please correct me if I'm wrong (I'm not sure if this is valid).

#161611 - simonjhall - Thu Aug 07, 2008 7:46 am

ritz wrote:
So basically the problem is that the CPU halts after the first set of DMA_* (because it's DMA'ing now) preventing you from running the next few instructions that load the data into the last 3 DMA_* ports.
Nah, the first one does run asychronously and the CPU doesn't block but the instant you try and set another DMA up, it'll pause.

I'll have a go with what you suggest though, and set them up ahead of time and see what happens :-)
_________________
Big thanks to everyone who donated for Quake2

#161620 - zeruda - Thu Aug 07, 2008 9:19 pm

HyperHacker wrote:
Specifically, what I'm doing is calling dmaCopyWordsAsynch() for channels 0, 1 and 2 and dmaCopyWords() for channel 3.


Have you tried it like that?

#161623 - Dwedit - Thu Aug 07, 2008 10:06 pm

DmaCopyWords and DmaCopyWordsAsync are both preprocessor macros.
The one without async also contains a spinwait loop for the DMA to finish.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."