gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

Beginners > Changing screen fading during hblank

#53577 - Dan_attacker - Fri Sep 09, 2005 1:29 pm

In my game, I have the top part of the screen fade while the lower part stays bright. I'm doing this by using a vcount interrupt and then changing the REG_COLEY value to make the screen bright in the middle of drawing the screen. It works perfectly on VisualBoyAdvance, but when I run it on actual hardware, the left of the first line that I changed the brightness for is still dark. About 8 pixels are still dark, and it also wiggles (kind of hard to explain). I've tried disabling all other interrupts but the problem still exists. Here's the code that is run when the vcount interrupt is triggered...
Code:
void dialogsprdim()
{
   dialogirqon=1;
   WaitForHblank();
   SetMode(MODE_0 | H_BLANK_OAM | OBJ_ENABLE | OBJ_MAP_2D | BG0_ENABLE | BG1_ENABLE | BG2_ENABLE | BG3_ENABLE);
   REG_COLY = 0;
   while (REG_VCOUNT!=161)
   {
   }
   SetMode(MODE_0 | H_BLANK_OAM | OBJ_ENABLE | OBJ_MAP_1D | BG0_ENABLE | BG1_ENABLE | BG2_ENABLE | BG3_ENABLE);
   REG_COLY = sintensity2;
   kramWorker();
   dialogirqon=0;
}

If I can't resolve this problem then I will have to do fading using the palette. Any ideas?

#53592 - tepples - Fri Sep 09, 2005 4:49 pm

Is the "wiggling" the same sort of wiggling that happens at the top left corner of the status bar when you play Super Mario Bros. 3 on NES hardware? If so, I know the problem: You're using an hblank interrupt to make a raster effect, and because the code for the first line takes longer than one hblank period, the effect happens during draw time.

Try setting a vcount interrupt that makes a one-shot hblank DMA.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#53664 - Dan_attacker - Sat Sep 10, 2005 12:33 pm

Quote:

Try setting a vcount interrupt that makes a one-shot hblank DMA.

How would I go about doing this? Thanks in advance.

#53667 - Dan_attacker - Sat Sep 10, 2005 1:16 pm

Ok, it works now. Is this what you wanted me to do?
Code:
u16 fadeset = (BIT0|BIT1|BIT2|BIT6|BIT7);
void dialogsprdim()
{
   REG_DMA3SAD = (u32)&fadeset;
   REG_DMA3DAD = 0x4000050;
   REG_DMA3CNT = 1 | DMA_16HBLANK;
   SetMode(MODE_0 | H_BLANK_OAM | OBJ_ENABLE | OBJ_MAP_2D | BG0_ENABLE | BG1_ENABLE | BG2_ENABLE | BG3_ENABLE);
}

If there is a more correct way of doing it, please feel free to share it. Thanks!

#61528 - curious - Mon Nov 21, 2005 6:19 am

tepples wrote:
Is the "wiggling" the same sort of wiggling that happens at the top left corner of the status bar when you play Super Mario Bros. 3 on NES hardware? If so, I know the problem: You're using an hblank interrupt to make a raster effect, and because the code for the first line takes longer than one hblank period, the effect happens during draw time.


I have this problem too, and I'd like to ask about it.

Part of my program walks through slices of a colour cube, in double buffered MODE_4. As that's an indexed mode, I intercept the HBlank, and rewrite the palette on each line. Works great in Boycott Advance, but I get this same incomplete execution of the ISR before the line completes - but again, only on the first line.

I'm not sure of two things - why does only the first line appear to be slow? Second, will a DMA transfer help me? I'm not sure if there'll be a speed saving going from EWRAM to PAL RAM as compared to just unrolling the loop, which is what I had. But I couldn't find cycle timings in Cowbite for direct assignment, and I don't know a lot about ARM ( my program is in Thumb, by the way ).

Is my only alternative to attempt to construct a new palette in EWRAM as the line is being painted by hardware, and use a repeating DMA triggered on HSync to blast it across, with my fingers crossed it'll all synch up properly?

Finally, I have one last question - I apologise if this has been covered in a FAQ or something, but I must have missed it - My DMA transfers / bulk loops & assignments, etc, use 32 bit values where possible. However, I'm told that as I'm writing my code in Thumb, and hence, I think, residing in EWRAM land, my transfers will really be all 16 bit. Am I getting any value here? Do I need to rebuild my code into ARM and sock it into IWRAM?

Thanks for your time - I hope this was alright for a first post.

Edit: Here's the ISR - it's handed off to by libgba's Interrupt Switchboard:
Code:
void isrHCHBlank()
{
        REG_IE = 0;
        // Lines 16 .. 143 are set as simple implementing gradients
        // with a step size of four ( double ) pixels.  As this is
        // firing in the HBlank, it needs to be done on the line
        // *before* the line you want the colours to effect.
        if( REG_VCOUNT >= 15 && REG_VCOUNT <= 142 )
        {
                u32 baseColour = blueSlice | ( ( ( REG_VCOUNT - 15 ) >> 2 ) << 5 ) | ( ( ( REG_VCOUNT - 15 ) >> 2 ) << 21 );
                ((u32 *)BG_COLORS)[  0 ] = baseColour | 0x00010000;
                ((u32 *)BG_COLORS)[  1 ] = baseColour | 0x00030002;
                ((u32 *)BG_COLORS)[  2 ] = baseColour | 0x00050004;
                ((u32 *)BG_COLORS)[  3 ] = baseColour | 0x00070006;
                ((u32 *)BG_COLORS)[  4 ] = baseColour | 0x00090008;
                ((u32 *)BG_COLORS)[  5 ] = baseColour | 0x000B000A;
                ((u32 *)BG_COLORS)[  6 ] = baseColour | 0x000D000C;
                ((u32 *)BG_COLORS)[  7 ] = baseColour | 0x000F000E;
                ((u32 *)BG_COLORS)[  8 ] = baseColour | 0x00110010;
                ((u32 *)BG_COLORS)[  9 ] = baseColour | 0x00130012;
                ((u32 *)BG_COLORS)[ 10 ] = baseColour | 0x00150014;
                ((u32 *)BG_COLORS)[ 11 ] = baseColour | 0x00170016;
                ((u32 *)BG_COLORS)[ 12 ] = baseColour | 0x00190018;
                ((u32 *)BG_COLORS)[ 13 ] = baseColour | 0x001B001A;
                ((u32 *)BG_COLORS)[ 14 ] = baseColour | 0x001D001C;
        }
        else ( ( u32 * ) BG_COLORS )[ 0 ] = 0x7FFF0000;

        REG_IF |= IE_HBL;
        REG_IE = 1;
}

Curious.
_________________
I thought they was prunes!

#61539 - DekuTree64 - Mon Nov 21, 2005 10:26 am

Dan_attacker wrote:
If there is a more correct way of doing it, please feel free to share it. Thanks!

Nope, that's about as good as it gets. As for why the problem happened in the first place, it's because the VCount interrupt triggers at the START of the line. That means it has already started drawing, so the first few pixels get missed.

curious wrote:
Part of my program walks through slices of a colour cube, in double buffered MODE_4. As that's an indexed mode, I intercept the HBlank, and rewrite the palette on each line. Works great in Boycott Advance, but I get this same incomplete execution of the ISR before the line completes - but again, only on the first line.

This is actually the opposite problem from what the OP was having :)
The HBlank interrupt happens AFTER the line is drawn, but before moving onto the next line. So in your ISR, if REG_VCOUNT is 0, that means line 0 has already been drawn, and whatever changes you make will be visible on line 1.

Usually that means your effect will be off by one line from what you intended, plus the first line being missed entirely.
To correct for the off-by-one, just add 1 to VCount before plugging it into your formula. To fix the first line, wrap your vcount around to 0 if it's the last line (therefore setting up the first line for next frame).

I think the HBlank interrupt keeps firing during VBlank, so you'll probably want to bail out if VCount is >= 160. Something like:
Code:

void isrHCHBlank()
{
    u32 vcount = REG_VCOUNT;
    if (vcount < 160)
    {
        vcount += 1;       // Correct off-by-one
        if (vcount == 160)
            vcount = 0;    // Set up first line for next frame

        // ...do stuff like before, but using vcount instead of REG_VCOUNT
    }
    REG_IF = IE_HBL;   // Don't need the |= here (funky hardware thing)
    // Also don't need to mess with REG_IE, because interrupts
    // are automatically disabled by the CPU
}


Here's another thread on HBlank stuff, if you feel like reading.

Quote:
My DMA transfers / bulk loops & assignments, etc, use 32 bit values where possible. However, I'm told that as I'm writing my code in Thumb, and hence, I think, residing in EWRAM land, my transfers will really be all 16 bit. Am I getting any value here?

ARM or THUMB code won't affect the speed of copying data around. That's determined by the waitstates of the memory areas being accessed.

A 32-bit transfer will be done as 2 16-bit transfers by the hardware, but it will still be faster than doing 2 16-bit transfers by yourself, since each load instruction takes 2 cycles in addition to the memory transfer time, and stores are 1+transfer time.

For DMA, check the DMA timing chart in Cowbite, but generally just use 32-bit unless you have a reason not to.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#61540 - curious - Mon Nov 21, 2005 11:12 am

Edit: I just found another odd artifact in my program. Might be best to hold off looking into this any further until I run it down. Cheers!

DekuTree64 wrote:
The HBlank interrupt happens AFTER the line is drawn, but before moving onto the next line. So in your ISR, if REG_VCOUNT is 0, that means line 0 has already been drawn, and whatever changes you make will be visible on line 1.


Thanks for taking the time out to have a look at this. :-) Unfortunately, I'm still not quite sure where I'm going wrong! D-:

I understand what you're saying about the HBlank being fired with the line in VCOUNT already painted in, but I think I've already compensated for this - the lines I want affected are 16 to 143, and so the ISR will fire according to:
Code:
        if( REG_VCOUNT >= 15 && REG_VCOUNT <= 142 )
I agree with what you said about not rewriting the palette past VCOUNT >160. In fact, it really only needs to be written after the "crazy colour" section is finished ( VCOUNT == 143 ) and once before starting the effect. I will fix that when I get home.

I'm still not sure how this is causing just the first three or four pixels to not get their colour though - especially as those first four pixels are all the same colour ( check the note in the comment about the eight pixel blocks ). I can only surmise that the palette is not being updated quick enough, but surely the first palette entry, if anything, should go through on time?

I will have a look at the other article when I get home, but thanks again for all the help ( particularly for clearing up the word size thing ). Would it possibly help if I posted a rom of my program ( just this colour blending section ) online for someone to look at?
_________________
I thought they was prunes!

#61541 - Cearn - Mon Nov 21, 2005 11:18 am

curious wrote:
I'm not sure of two things - why does only the first line appear to be slow?

By first line, do you mean the first scanline (i.e., at the top) or the first line of code in the isr?

curious wrote:
Second, will a DMA transfer help me? I'm not sure if there'll be a speed saving going from EWRAM to PAL RAM as compared to just unrolling the loop, which is what I had. But I couldn't find cycle timings in Cowbite for direct assignment, and I don't know a lot about ARM ( my program is in Thumb, by the way ).

Is my only alternative to attempt to construct a new palette in EWRAM as the line is being painted by hardware, and use a repeating DMA triggered on HSync to blast it across, with my fingers crossed it'll all synch up properly?

Code:
void isrHCHBlank()
{
        REG_IE = 0;
        // Lines 16 .. 143 are set as simple implementing gradients
        // with a step size of four ( double ) pixels.  As this is
        // firing in the HBlank, it needs to be done on the line
        // *before* the line you want the colours to effect.
        if( REG_VCOUNT >= 15 && REG_VCOUNT <= 142 )
        {
                u32 baseColour = blueSlice | ( ( ( REG_VCOUNT - 15 ) >> 2 ) << 5 ) | ( ( ( REG_VCOUNT - 15 ) >> 2 ) << 21 );
                ((u32 *)BG_COLORS)[  0 ] = baseColour | 0x00010000;
                ((u32 *)BG_COLORS)[  1 ] = baseColour | 0x00030002;
                ((u32 *)BG_COLORS)[  2 ] = baseColour | 0x00050004;
                ((u32 *)BG_COLORS)[  3 ] = baseColour | 0x00070006;
                ((u32 *)BG_COLORS)[  4 ] = baseColour | 0x00090008;
                ((u32 *)BG_COLORS)[  5 ] = baseColour | 0x000B000A;
                ((u32 *)BG_COLORS)[  6 ] = baseColour | 0x000D000C;
                ((u32 *)BG_COLORS)[  7 ] = baseColour | 0x000F000E;
                ((u32 *)BG_COLORS)[  8 ] = baseColour | 0x00110010;
                ((u32 *)BG_COLORS)[  9 ] = baseColour | 0x00130012;
                ((u32 *)BG_COLORS)[ 10 ] = baseColour | 0x00150014;
                ((u32 *)BG_COLORS)[ 11 ] = baseColour | 0x00170016;
                ((u32 *)BG_COLORS)[ 12 ] = baseColour | 0x00190018;
                ((u32 *)BG_COLORS)[ 13 ] = baseColour | 0x001B001A;
                ((u32 *)BG_COLORS)[ 14 ] = baseColour | 0x001D001C;
        }
        else ( ( u32 * ) BG_COLORS )[ 0 ] = 0x7FFF0000;

        REG_IF |= IE_HBL;
        REG_IE = 1;
}


There are a few simple ways of speeding up this code, even without resorting to ARM code. First, don't use REG_VCOUNT everywhere. It is volatile, so it will be loaded into a register every time. Capture it in a variable and use that instead. Also, in THUMB code ORring is rather limited, if you can use ADD if you can.

Other things that can be taken into consideration: ARM CPUs don't like large constants like 0x00050004, which have to be constructed in multiple tries (or put in a lut and then loaded, which is what actually happens). Using incremental offsets is faster. GCC doesn't particularly care for #defined arrays like BG_COLORS either: creating a separate pointer and incrementing that should be faster.

So, instead of
Code:
u32 baseColour = blueSlice | ( ( ( REG_VCOUNT - 15 ) >> 2 ) << 5 ) | ( ( ( REG_VCOUNT - 15 ) >> 2 ) << 21 );
    ((u32 *)BG_COLORS)[  0 ] = baseColour | 0x00010000;
    ((u32 *)BG_COLORS)[  1 ] = baseColour | 0x00030002;
...

try
Code:
u32 vc= REG_VCOUNT-15;
u32 clr= blueSlice | ( ((vc>>2)<<5 ) | ((vc>>2)<<21);
clr += 0x00010000;
u32 clrofs= 0x00020002;
u32 *pal= (u32*)BG_COLORS;

*pal++ = clr;
clr += clrofs;

Now, this should have been faster, but now that I check the asm code, it seems that GCC is particularly inept at optimising it ... even when you do it for it. It seems that it just ignores what I say and uses a lut/load for both the palette and color offset regardless.
Code:
@ generated asm for (more for background info than anything else.
@ feel free to skip)
@   *pal++ = clr;
@   clr += ofs;

@ what I get
    ldr r3, .L65+8  @ r3=ofs[ii],           S+N+I
    add r1, r2, r3  @ r1= clr0+ofs,         S
    ldr r3, .L65+12 @ r3= &BG_COLORS[ii]    S+N+I
    str r1, [r3]    @ *r3= clr0+ofs         2N
@ total: 3S+4N+2I

@ should be:
@ r0=pal, r1=clr, r2=ofs
    stmia   r0!, {r1}       2N
    add     r1, r1, r2      1S
@ 1S+2N

For code run from EWRAM this comes down to 23 (maybe even 25 due to loading u32 through the 16 bit bus) vs 9 cycles. Quite a difference I should say. If you can do it in asm, go for it.
As for precalculating+HDMA: you won't win much in terms of cycles (per haps even lose some), but you will win some because of where those cycles are spent, in the much larger Vblank instead of the skimpy Hblank. But you'll have to add two VCount interrupts to make sure it starts and stops at the correct time.

[quote="curious]Finally, I have one last question - I apologise if this has been covered in a FAQ or something, but I must have missed it - My DMA transfers / bulk loops & assignments, etc, use 32 bit values where possible. However, I'm told that as I'm writing my code in Thumb, and hence, I think, residing in EWRAM land, my transfers will really be all 16 bit. Am I getting any value here? Do I need to rebuild my code into ARM and sock it into IWRAM?
[/quote]
What Deku said: a single 32bit transfer will always be faster than doing it in two 16bit ones because you'll use fewer instructions. Plus there maybe some fringe benefits too.
Putting things in IWRAM will always be faster than anywhere else; whether you need to is a pretty much up to you. If it works with the current code, fine. If it doesn't, try the ARM+IWRAM thing and see how that works out.

#61579 - tepples - Mon Nov 21, 2005 8:50 pm

You'll definitely want to do this sort of thing as hblank DMA instead of an hblank ISR. Use a vcount interrupt on the line above the gradient to start the HDMA, and then use another vcount interrupt on the line below the gradient to stop the HDMA.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#61597 - curious - Mon Nov 21, 2005 11:48 pm

Thanks, Tepples! Seems like I'm going to be firing a hell of a lot of VCount interrupts though. :-(

In line with earlier suggestions, I tried assembling some of the loop. The following works great in the emulator:
Code:
void isrHCHBlank()
{
        u32 nvVCount = REG_VCOUNT;
        REG_IE = 0;

        // Lines 16 .. 143 are set as simple implementing gradients
        // with a step size of four ( double ) pixels.  As this is
        // firing in the HBlank, it needs to be done on the line
        // *before* the line you want the colours to effect.

        if( nvVCount >= 15 && nvVCount <= 142 )
        {
                u32 *palPointer = ((u32 *)BG_COLORS);
                u32 newColour, colourOffset;

                nvVCount -= 15;
                newColour = blueSlice | ( ( nvVCount >> 2 ) << 5 ) | ( ( nvVCount >> 2 ) << 21 );

                newColour += 0x00010000;
                colourOffset = 0x00020002;

                asm( "  mov     r0,     #16\n\t"
                     "teleport:\n\t"
                     "  stmia   %0!,    {%1}\n\t"
                     "  add     %1,     %1,     %2\n\t"
                     "  sub     r0,     r0,     #1\n\t"
                     "  cmp     r0,     #0\n\t"
                     "  bne     teleport"
                     : /* */
                     : "r" (palPointer), "r" (newColour), "r" (colourOffset)
                     : "r0" );
        }
        else ( ( u32 * ) BG_COLORS )[ 0 ] = 0x7FFF0000;

        REG_IF |= IE_HBL;
        REG_IE = 1;
}

... but it behaves very strangely on hardware. Namely, while the lines are coloured correctly, the VBlank interrupt which steps the value of blueSlice doesn't seem to be firing ( edit - this stops the program from progressing, as it counts off a certain number of blanks before continuing ).

Commenting out the block of assembly makes the problem go away. Bah! I hope I got the assembly thing correct too - I left the "outputs" line empty, although palPointer and newColour are both modified by the routine, because they were already in the "inputs" section ( edit: actually, this is kind of irrelevant, as I don't care what their values are after the loop completes ).

Curious.
_________________
I thought they was prunes!


Last edited by curious on Tue Nov 22, 2005 1:14 am; edited 1 time in total

#61604 - DekuTree64 - Tue Nov 22, 2005 12:23 am

curious wrote:
... but it behaves very strangely on hardware. Namely, while the lines are coloured correctly, the VBlank interrupt which steps the value of blueSlice doesn't seem to be firing.

The REG_IF |= IE_HBL may be what's blocking your VBlank. When you write a 1 to a bit in IF, that bit is cleared. If the HBlank on the last line runs over until VBlank sets its bit, the |= will clear it, and the interrupt will never fire (normally it would wait until your HBlank returns, and then fire immediately).

In other words, just do REG_IF = IE_HBL to clear only the HBlank bit, leaving any other active bits to fire off after you return.

And just to check, is your REG_IE defined as address 0x4000200, or 0x4000208?
Normally 0x4000200 is called REG_IE, and 0x4000208 is called REG_IME, but if that was the case here I don't think your HBlank would work at all...
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#61608 - curious - Tue Nov 22, 2005 1:03 am

DekuTree64 wrote:
The REG_IF |= IE_HBL may be what's blocking your VBlank. When you write a 1 to a bit in IF, that bit is cleared. If the HBlank on the last line runs over until VBlank sets its bit, the |= will clear it, and the interrupt will never fire (normally it would wait until your HBlank returns, and then fire immediately).


I'm not quite sure why this problem never manifested before ( I've been using the |= construction since I started on this program ), but this change fixed my problem, and better yet - combined with the assembly, fixed the amazing missing opening pixels! To be quite honest, I don't really understand the magic that drives REG_IF... it seems to work a little counterintuitively.

DekuTree64 wrote:
And just to check, is your REG_IE defined as address 0x4000200, or 0x4000208?


This build of libgba defines it as *(vu16*)(REG_BASE+0x200), which yes, should resolve to 0x4000200. Thanks again! I will play "Deku's Palace" on my ocarina tonight in your honor.

curious
_________________
I thought they was prunes!