gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Graphics > clearing a tilemap with DMA

#68580 - nmain - Wed Jan 25, 2006 9:02 pm

Code:

  // priority 0, tile base: 0600c000
  // 16 color, map base: 06005000
  // 512x256
  REG_BG1CNT = 0x4a0c;
  u32 scratch = 0;
  DMA3COPY (&scratch, 0x06005000, DMA_IMMEDIATE | DMA32 | DMA_SRC_FIXED | (4096 / 4));


This should set every entry in the BG1 tilemap to tile 0, palette 0, and no h or vflip. But it doesn't; instead the tilemap is populated by a repeating 32 bit non-zero value. Debugging it; the DMA registers appear to be populated properly, but derefrencing the pointer for source address leads to a iwram memory location whose value is not 0. So the DMA is doing exactly what I expect it to, but scratch isn't 0?

edit: this is what gcc generates. I can't write assembly and can barely read it, but as best as I can tell scratch never seems to be initialized? The first strh is BG1CNT; the second is BG1HOFS (not in the C fragment; but just set to 0), and the three str are the three DMA3 registers.

Code:

   .file   "menu.c"
   .text
   .align   2
   .global   clearmenubg
   .type   clearmenubg, %function
clearmenubg:
   @ args = 0, pretend = 0, frame = 4
   @ frame_needed = 0, uses_anonymous_args = 0
   @ link register save eliminated.
   mov   r3, #18944
   mov   r1, #67108864
   add   r3, r3, #12
   strh   r3, [r1, #10]   @ movhi
   mov   r3, #100663296
   add   r3, r3, #20480
   sub   sp, sp, #4
   mov   r2, #-2063597568
   str   sp, [r1, #212]
   add   r2, r2, #1024
   str   r3, [r1, #216]
   mov   r3, #0   @ movhi
   str   r2, [r1, #220]
   @ lr needed for prologue
   strh   r3, [r1, #20]   @ movhi
   add   sp, sp, #4
   bx   lr
   .size   clearmenubg, .-clearmenubg
   .ident   "GCC: (GNU) 4.0.2"

#68584 - Cearn - Wed Jan 25, 2006 9:41 pm

nmain wrote:
edit: this is what gcc generates. I can't write assembly and can barely read it, but as best as I can tell scratch never seems to be initialized?


It's something like that yeah. The problem is that variables usually go directly into registers, which do not have addresses. What you can take the address of is the stack, which is also a place variables sometimes go. The code you quoted makes a place on the stack, and even puts its address into REG_DMA3SAD, but for some reason the actual value is never actually put there.

A solution to this is to make the scratch variable or the DMA3COPY argument volatile.
This is what I use (well I don't because I have a faster fill, but this is what I would use if I wanted to do it with DMA), feel free to modify for your own purposes:

Code:
INLINE void DMA_FILL(void *dst, volatile u32 src, u32 count, u32 ch, u32 mode)
{
   dma_mem[ch].cnt= 0;   
   dma_mem[ch].src= (const void*)&src;
   dma_mem[ch].dst= dst;
   dma_mem[ch].cnt= count | mode | DMA_SRC_FIX;   
}


NOTE: the exact same thing happens with CpuSet and CpuFastSet if you want to use those for fills.

#68589 - nmain - Wed Jan 25, 2006 10:52 pm

Thanks. :) As you suggested, declaring scratch as volatile fixed the problem.

I understand local variables can be just put into registers without ever giving them memory, but if I use the & operator it should put the variable on the stack somewhere, and if I initialize that variable to something the copy on the stack should get that value. The optimizer should be smart enough to know that if I use &var, I might dereference it later and expect it to have the right value! Maybe because DMA3COPY isn't a function call but just an macro that expands to placing &var in another memory location, gcc doesn't "catch" that the value of var on the stack needs to be updated? It works as is with -O0..

edit: Thinking about it a little more, gcc did do the right thing. Even if I store &scratch elsewhere, it shouldn't legally be valid after returning from this function, and gcc has no way to know that the write will cause a DMA transfer of that variable to occur *before* this function even returns.

#68596 - Cearn - Wed Jan 25, 2006 11:41 pm

nmain wrote:
I understand local variables can be just put into registers without ever giving them memory, but if I use the & operator it should put the variable on the stack somewhere, and if I initialize that variable to something the copy on the stack should get that value. The optimizer should be smart enough to know that if I use &var, I might dereference it later and expect it to have the right value! Maybe because DMA3COPY isn't a function call but just an macro that expands to placing &var in another memory location, gcc doesn't "catch" that the value of var on the stack needs to be updated? It works as is with -O0.

edit: Thinking about it a little more, gcc did do the right thing. Even if I store &scratch elsewhere, it shouldn't legally be valid after returning from this function, and gcc has no way to know that the write will cause a DMA transfer of that variable to occur *before* this function even returns.

You can't store &scratch elsewhere, because it's a local variable so it doesn't even exist anywhere else. Also, GCC has no knowledge of what DMA does. All it sees is that you put the address of a variable somewhere in memory, which it dutifully does, and because the DMA regs are volatile, it expects it to be used somewhere, whether the variable itself exists by that time is none of GCC's concern: if you're stupid enough to use local variables outside of the function it's defined in, that's your problem. This is in the C spec, btw, not my own opinion and I've made the mistake many a times myself.

You do have a point though. While it's true that the address is used, the actual value of scratch isn't. So GCC figures that that can be removed. So what you're left with could be expected by some form of logic: the space of the variable is made and used, but you never inform GCC you want to use the variable itself, so it dikes that out of the function like any other non-volatile thing. While this is very annoying, I guess it does make sense.

Hm, never thought of it that way before. Thanks!

PONDER: now, I wonder if you can modify the DMA definition in such a way that they can be used as we'd like ...

#68999 - RickA - Fri Jan 27, 2006 9:34 pm

Cearn wrote:

This is what I use (well I don't because I have a faster fill, but this is what I would use if I wanted to do it with DMA)


Would you mind sharing how you do your fill?

#69100 - Cearn - Sat Jan 28, 2006 2:01 pm

The memset16() and memset32() that come with tonc's code. Essentially they do what CpuFastSet does, but without its alignment/size restrictions. The difference in speed isn't necessarily that great (~15%), but the fact that DMA can screw up interrupts while CpuFastSet doesn't is also important.