gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > GCC 4.1.1 optimisations

#100819 - dushan42 - Wed Aug 30, 2006 11:06 pm

Hi,

I recompiled some of my old code with the latest devkit and discovered that my dma memset no longer worked if compiled with -O1 or higher:

Code:
inline void dmaset16(int c, u32 word, volatile void *dst, u32 halfWords)
{
   volatile u32 *dma = &REG_DM0SAD + 3*c;
   *dma++ = u32(&word);
   *dma++ = u32(dst);
   *dma = halfWords | (2 << 23) | (1 << 31);
}


It seems like the word argument gets corrupted (I was trying to fill a block with zeros and it got filled with junk instead).

After a bit of poking, I discovered that making 'word' volatile fixes the problem, however it doesn't make any sense. Why does it have to be volatile? Is it a bug in gcc 4.1.1?

I'm worried that there might be other (more subtle) bugs elsewhere in the code so I'd really like to understand what's going on here. Did anyone experience a similar problem?

As soon as I have some spare time I'll try to compare the generated assembly - until then please feel free to speculate :-).

#100822 - gmiller - Wed Aug 30, 2006 11:24 pm

I would be converned that the code even worked in the first place:

Code:

&dma++ = (u32)(&word);

should put the address of the local variable 'word' into the DMA register which does not look like what you want. I would have though you were passing the address of where to read from and this is not what I would expect. Your functional header stated that 'word' is a u32 so I would think your call would be:

Code:

short from[??]

dmaset16(0, (u32)&from, VRAM, sizeof(from) >> 1);



but I am not sure of your intent but the code as is should not work as I see it. But I have been wrong before (and will be again).

#100826 - dushan42 - Wed Aug 30, 2006 11:44 pm

I'm guessing the 'word' thing is a copy&paste error from the dmaset32 version of the function. It's only a style issue, functionally it makes no difference:

Code:
inline void dmaset16(int c, u16 halfWord, volatile void *dst, u32 halfWords)
{
   volatile u32 *dma = &REG_DM0SAD + 3*c;
   *dma++ = u32(&halfWord);
   *dma++ = u32(dst);
   *dma = halfWords | (2 << 23) | (1 << 31);
}


The code above has the same problem if 'halfWord' isn't volatile.

Re: passing pointer to local variable - I believe this is ok in this case as GBA's dma transfers interrupt the CPU (i.e. the function won't return until the memset is over).

#100829 - gmiller - Thu Aug 31, 2006 12:06 am

Quote:

dushan42


What you are saying is that the following two lines are the same?

Code:

u32 foo;
u32  foo1 = 0x6000000;

foo = foo1;
foo = (u32)(&foo);



And I say they are completely different.

#100831 - dushan42 - Thu Aug 31, 2006 12:15 am

Huh? Clearly not, but I have no idea what you're implying.

The code above worked fine with gcc 3.xxx but gcc 4.1.1 requires the word/halfWord argument to be volatile. That's the mystery here - I don't really see your point...

#100833 - gmiller - Thu Aug 31, 2006 12:26 am

I was trying to make the point that the first code should not have worked either and the new code with the volatile proably should not work as well. The volatile keyword just forces the compiler to generate extra instructions to refetch the value of a variable or what a pointer points to. Generally it is used where the values can change in ways that the code does not make happen. So any value that the compiler has saved might not be the current value so it needs to be forced to get the value again. Now the fact that it is working leads me to the impression that the calling sequence of the routine is leading me astray. Could you paste examples of how you are calling the function so we can see what you are passing.

#100836 - tepples - Thu Aug 31, 2006 12:51 am

Often, function arguments and other automatic local variables are kept in CPU registers. The DMA controller can't see into CPU registers. Therefore, any function argument or automatic variable used as source data for DMA should be volatile in order to keep it in RAM that the DMA controller can see.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#100837 - gmiller - Thu Aug 31, 2006 12:54 am

My bad also in missing the inline keyword ... sorry

#100841 - dushan42 - Thu Aug 31, 2006 1:51 am

Unfortunately the function is a bit obfuscated, let me explain what's going on.

I'm using DMA0 to clear a section of VRAM. It's setup so that during the transfer it only increments the destination address, so whatever u16 the source address is pointing to will get written repeatedly over the destination block. I'm using 16 bit transfer as VRAM doesn't support 32 bit access.

The parameters are:
int c - the DMA channel (in this case 0)
u16 halfWord - the u16 to be copied into the destination block
void* dst - pointer to the destination block
u32 halfWords - the size of the destionation block in half words (u16s).

Line 1:
Code:
volatile u32 *dma = &REG_DM0SAD + 3*c;


Works out the first control register for the specified dma channel. For channel 0, this means that 'dma == REG_DMA0SAD'.

Line 2:
Code:
*dma++ = u32(&halfWord);


Loads the DM0SAD (DMA0 source address register) with the address of the half word we want to copy all over the destination block (i.e. address of the 'halfWord' argument) . Incrementing dma makes it point to the next control register - REG_DMA0DAD.

Line 3:
Code:
*dma++ = u32(dst);


Loads the DM0DAD (DMA0 destination address) with the address of the destination block. Incrementing dma makes it point to REG_DMA0CNT.

Line 4:
Code:
*dma = halfWords | (2 << 23) | (1 << 31);


Initialise the REG_DMA0CNT (DMA0 control register).

Code:

REG_DMA0CNT (stolen from CowBite Spec):

31 30 29 28  27 26 25 24  23 22 21 20  19 18 17 16  15 14 13 12  11 10 09 08  07 06 05 04  03 02 01 00
 N  I  M  M   U  S  R  A   A  B  B  X   X  X  X  X   X  X  L  L   L  L  L  L   L  L  L  L   L  L  L  L

0-13  (L) = Number of words or halfwords to copy
21-22 (B) = Type of increment applied to destination address.
23-24 (A) = Type of increment applied to source address.
25    (R) = Repeat.
26    (S) = Size. If set, copy 32-bit quantities (words) If clear, copy 16-bit quantities (half words).
27    (U) = Unknown.
28-29 (M) = Start Mode.
30    (I) = IRQ.
31    (N) = Set this bit to enable DMA operation.


Line 4 sets up the following:

    0-13 (L) = 'halfWords'
    21-22 (B) = 0 (increment destination address after each copy)
    23-34 (A) = 2 (don't increment source address)
    26 (S) = 0 (16-bit mode)
    28-29 (M) = 0 (start immediately)
    31 (N) = 1 (start the DMA transfer!).


As soon as I write into REG_DMA0CNT, the DMA transfer kicks off and copies the value of halfWord all over the 'dst' block. During the transfer the CPU is halted.

I would have thought that taking an address of a variable should be a big enough hint for the compiler not to optimise it out. I can only guess what 'u32(&halfWord)' evaluates to when halfWord isn't volatile - it certainly doesn't point to anything resembling halfWord though.

I don't see what makes you think that this code is incorrect, it looks good to me and it's been working great for a long time. Making halfWord 'volatile' makes it work fine in gcc 4.1.1 - I'm just curious as to why this needs to be done.

#100843 - dushan42 - Thu Aug 31, 2006 1:57 am

tepples wrote:
Often, function arguments and other automatic local variables are kept in CPU registers. The DMA controller can't see into CPU registers. Therefore, any function argument or automatic variable used as source data for DMA should be volatile in order to keep it in RAM that the DMA controller can see.


That seems to be the case, but I don't think it should happen. If I take an address of a variable - the compiler can't just go 'oh, I only have this value in a register... ooh I'll just make up some random address for you!'. When I request a pointer to an argument passed via register, the compiler should allocate space on stack and copy the value there... or am I going crazy?

#100844 - tepples - Thu Aug 31, 2006 2:01 am

Don't try to second-guess the semantics that the compiler will apply, and don't make the compiler second-guess you either. If you intend for something to be in memory, put it in memory explicitly (keyword volatile). For a single int on the stack, it doesn't hurt performance to make it volatile, especially given that a large memset() that uses the 'stmia' instruction will be faster than DMA anyway because 'stmia' doesn't have to reread the source data word every time.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#100846 - dushan42 - Thu Aug 31, 2006 2:22 am

My understanding of 'volatile' was 'Hey compiler, that value might change at any time by things other than you. Don't keep it in a register'. Since 'halfWord' isn't being changed by anyone else, I didn't think volatile was appropriate. It does need to be in the memory but there's nothing stopping the compiler from caching it in a register.

Taking a pointer of something should be sufficient in telling the compiler that the value needs to be in the memory. Using volatile achieves that too - just seems like using a sledgehammer to open a walnut :-).

So I still think that gcc 4.1.1's optimisation is incorrect, but I agree with your advice & I'll use volatile more liberally - it can't hurt.

EDIT:
How about this code:

int x = 0;
scanf("%i", &x);
return x;

I dread to think what would happen if the compiler decided to optimise x into a register and just pass some random pointer to scanf...

#100854 - tepples - Thu Aug 31, 2006 4:59 am

dushan42 wrote:
My understanding of 'volatile' was 'Hey compiler, that value might change at any time by things other than you. Don't keep it in a register'.

Replace "change" with "change or be read" for the true meaning of 'volatile'.

Quote:
Taking a pointer of something should be sufficient in telling the compiler that the value needs to be in the memory.

To be in the memory at what time? How long does it need to stay in the memory? GCC doesn't know that writes to DM0CNT halt the CPU and start the equivalent of a memcpy() or memset().

Quote:
How about this code:

int x = 0;
scanf("%i", &x);
return x;

I dread to think what would happen if the compiler decided to optimise x into a register and just pass some random pointer to scanf...

But then you're calling a function, and C guarantees that variables will be in their "architectural" state at the start and end of a function call. Not so with a write to a memory-mapped register with behavior that resembles function call; for that, you need to use 'volatile' more liberally.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#100856 - dushan42 - Thu Aug 31, 2006 5:17 am

Makes sense. Don't you hate it when improvements in a compiler expose your bugs? :)

You're abslutely right, I don't know why assumed that volatile only refers to external writes - doh!

My faith in gcc is restored :)