gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > optimising calculations..

#12228 - yaustar - Tue Nov 04, 2003 8:36 pm

At he moment I have this could snippet for a game to draw a radar. Basically it uses the collision layer to draw the outline of the environment. It works great expect for one problem, it's bog slow. I have tried everything that I could think of but all I seem to do id make it slower... Help is desperately needed here.

Code:

//---Draw radar---//
inline void func_drawRadar(s16 var_x, s16 var_y)
{
    bool even_odd;//false is even, true is odd
    u8 var_newX, var_newY, var_diffY, var_diffX;//temp variables
    s16 var_curX, var_curY;//temp variables
   
    var_curX = ((var_x + 120) >> 3) - 18;
    var_curY = (((var_y + 80) >> 3) - 18) << 6;
   
    for(u8 a = 0; a < 39; a++)//y
    {
        for(u8 b = 0; b < 39; b++)//x
        {
            if((Tank_Out_Collision[var_snakeHeight][var_curX + var_curY + b] & 16)  == 16)
            {
                if(((b >> 1)<< 1) == b)
                {even_odd = false;}
                else
                {even_odd = true;}
               
                var_newX = (b >> 3);//divide by eight
                var_newY = (a >> 3);//divide by eight
                var_diffX = b - (var_newX << 3);//obtain the reminder
                var_diffY = a - (var_newY << 3);//obtain the reminder
               
                if(even_odd==false)
                {
                    radar[(var_newY << 8)+(var_diffY << 2) +
                        (var_newX << 5) + (var_diffX >> 1)] |= 128;
                }
                else
                {
                    radar[(var_newY << 8)+(var_diffY << 2) +
                        (var_newX << 5) + (var_diffX >> 1)] |= 32768;
                }
            }
        }
        var_curY+=64;
    }
    radar[585] |= 129;
}


Thnaks in advance
_________________
[Blog] [Portfolio]

#12231 - sajiimori - Tue Nov 04, 2003 9:44 pm

I don't understand the purpose of these calculations:
Code:

var_newX = (b >> 3);
var_diffX = b - (var_newX << 3);
(var_newX << 5) + (var_diffX >> 1)

Here are some other ways of doing the same thing:
Code:

var_diffX = b & 7;
(b >> 3 << 5) + (var_diffX >> 1)

// or...

((b & ~7) << 2) + ((b & 7) >> 1)

Why would you clear the low 3 bits of a number, shift it left 2 bits, then add the low 3 bits back in except shifted right 1 bit? I am just as confused about the calculations for the 'a' component.

Anyway, without understanding that, my contribution is limited to simple optimizations (I also renamed a lot of variables to aid readability). I also don't know what kinds of arrays 'radar' and 'Tank_Out_Collision' are, so I can't make optimizations using pointers (which are potentially huge).
Code:

void func_drawRadar(int arg_x, int arg_y)
{
    register int x, y_part;
    int y, curX, curY, curSum;

    curX = ((arg_x + 120) >> 3) - 18;
    curY = (((arg_y + 80) >> 3) - 18) << 6;
    curSum = curX + curY;

    for(y = 0; y < 39; y++)
    {
        y_part = (y >> 3 << 8) + ((y & 7) << 2);

        for(x = 0; x < 39; x++)
        {
            if(Tank_Out_Collision[var_snakeHeight][curSum + x] & 16)
            {
                radar[y_part + (x >> 3 << 5) + ((x & 7) >> 1)] |=
                    x & 1 ? 32768 : 128;
            }
        }
        curSum+=64;
    }
    radar[585] |= 129;
}

I think this code could be much, much faster if I had more information.

#12232 - yaustar - Tue Nov 04, 2003 10:14 pm

I try to explain as best as I can. The collision layer is a plain simple array, left to right and the next row down. The radar is done as a 64x64 sprite which is loaded in a completely different way.

This:
Code:

if(((b >> 1)<< 1) == b)
                {even_odd = false;}
                else
                {even_odd = true;}
               
                var_newX = (b >> 3);//divide by eight
                var_newY = (a >> 3);//divide by eight
                var_diffX = b - (var_newX << 3);//obtain the reminder
                var_diffY = a - (var_newY << 3);//obtain the reminder
               
                if(even_odd==false)
                {
                    radar[(var_newY << 8)+(var_diffY << 2) +
                        (var_newX << 5) + (var_diffX >> 1)] |= 128;
                }
                else
                {
                    radar[(var_newY << 8)+(var_diffY << 2) +
                        (var_newX << 5) + (var_diffX >> 1)] |= 32768;
                }


Was used to find the right value to edit within the spriteaccording to the collision layer. 'radar' is just a [4096] array which when finished edited is DMAed into the memory and Tank_out_Collision is also a [4096].
_________________
[Blog] [Portfolio]

#12233 - yaustar - Tue Nov 04, 2003 10:30 pm

No luck, it still goes at the same speed as ever. I think it may have something to do with the fact that the function is called once per Vblank doing something 39*39 times :(
However I am intrerested what this line means
Code:

radar[y_part + (x >> 3 << 5) + ((x & 7) >> 1)] |=
                    x & 1 ? 32768 : 128;


mainly the ~, ? and :
_________________
[Blog] [Portfolio]

#12234 - jma - Wed Nov 05, 2003 12:05 am

I'm working on more optimizations for you, but here are some biggies right away (to get rid of bad conditionals):

Quote:
Code:

if(((b >> 1)<< 1) == b)
{even_odd = false;}
else
{even_odd = true;}

Change this to:
Code:
even_odd = b & 1;

Also, don't do:
Quote:
Code:
 if(even_odd==false)
{
     radar[(var_newY << 8)+(var_diffY << 2) +
         (var_newX << 5) + (var_diffX >> 1)] |= 128;
}
else
{
     radar[(var_newY << 8)+(var_diffY << 2) +
         (var_newX << 5) + (var_diffX >> 1)] |= 32768;
}

Instead do:
Code:
int adder[2] = { 32768, 128 };
...
radar[(var_newY << 8)+(var_diffY << 2) +
         (var_newX << 5) + (var_diffX >> 1)] |= adder[even_odd];

Also, for your remainder calculations, do this:
Code:
var_diffX &= 7;
var_diffY &= 7;

See what that does right now. But the best optimization you can make is in getting rid of that damn 39*39 loop... see if you can crunch that down some. I'm sure not all 1600 indexes need updating each frame. Instead, try having a flag to state whether one needs updating or not.

Jeff
_________________
massung@gmail.com
http://www.retrobyte.org

#12235 - yaustar - Wed Nov 05, 2003 12:19 am

Not much difference but I have a feeling it help in the long (thanks guys).

The 39*39 loop is the most annoying thing, but I will still have to use it at least once every eight cycles (giving for constant movement in one direction) Would this make much of a difference in terms of performence?
_________________
[Blog] [Portfolio]

#12236 - jma - Wed Nov 05, 2003 12:24 am

Well, here is some modified code (but keeping var names, etc):
Code:
inline void func_drawRadar(s16 var_x, s16 var_y)
{
int b,adder[2] = { 32768,128 };
register u8 var_newX, var_newY, var_diffY, var_diffX;
s16 var_curX, var_curY;//temp variables
   
var_curX = ((var_x + 120) >> 3) - 18;
var_curY = (((var_y + 80) >> 3) - 18) << 6;
   
for(u8 a = 0; a < 39; a++)//y
{
   for(u8 b = 0; b < 39; b++)//x
   {
       if((Tank_Out_Collision[var_snakeHeight][var_curX + var_curY + b] & 16)  == 16)
       {
   var_newX = (b << 5) & (~0x7); // div 8 mask 3 bits
   var_newY = (a << 2) & (~0x7); // div 8 mask 3 bits

   var_diffX = (var_diffX & 7) >> 1; // rem / 2
   var_diffY = (var_diffY & 7) << 2; // rem * 4

   // speedy if in registers
   var_newX += var_diffX;
   var_newY += var_diffY;

   radar[var_newY + var_newX] = adder[b & 1];
        }
    }
   var_curY+=64;
}
radar[585] |= 129;
}

If you email a screenshot of the effect, perhaps more help on limitting your loop would be possible. Good luck!

Jeff
_________________
massung@gmail.com
http://www.retrobyte.org

#12237 - yaustar - Wed Nov 05, 2003 12:28 am

I email you the bins, one without the radar and one with. :)
_________________
[Blog] [Portfolio]

#12238 - sajiimori - Wed Nov 05, 2003 2:48 am

yaustar,

? and : constitute the conditional operator. "a ? b : c" means "if a then b else c". ~ is the bitwise inverse operator.

Your code is slow due to the complexity of the code inside the loop. It is perfectly possible to do a 39*39 loop every frame, as long as the body of the loop is simple enough.

jma's code incorporates some of my optimizations, but not all. The 'adder' array is just obfuscation and does not make the code faster. Also, be careful because some of his code is not equivalent to yours.

Your explanations are somewhat incomplete, because you do not describe the actual types of your arrays. I gather that 'radar' is a u8[], but I still don't know the type of Tank_Out_Collision.

I still think your math is very odd. If you could explain exactly how your algorithm is supposed to work, I might be able to help you more. Until then, I'm at a loss.

#12239 - sajiimori - Wed Nov 05, 2003 3:30 am

Why can't you do something like this? It should run plenty fast.
Code:

void draw_radar(int map_x, map_y)
{
    register u8* tile = &Tank_Out_Collision[var_snakeHeight]
                [
                    ((map_x + 120) >> 3) - RADAR_WIDTH/2 +
                    ((((map_y + 80) >> 3) - RADAR_HEIGHT/2) << 6)
                ];

    register u8* pixel = radar;

    for(int y = 0; y < RADAR_HEIGHT; ++y)
    {
        for(int x = 0; x < RADAR_WIDTH; ++x)
        {
            if(*tile++ & SOLID_BIT)
                *pixel++ = SOLID_COLOR;
            else
                *pixel++ = CLEAR_COLOR;
        }

        pixel += 64 - RADAR_WIDTH;
        tile += MAP_WIDTH - RADAR_WIDTH;
    }
}

#12244 - yaustar - Wed Nov 05, 2003 2:06 pm

Code:

const u16 Tank_Out_Collision[3][4096] //colllision layer (64x64)
const u16 radarBlank[2048] /sprite (64x64)


Basically, all that code in that loop is to work out where the pixel I want to edit is stored within the data array of radar which is stored in a pian in the ass way to edit (8x8 pixel blocks).

I dont see how this would work though
Code:

for(int x = 0; x < RADAR_WIDTH; ++x)
        {
            if(*tile++ & SOLID_BIT)
                *pixel++ = SOLID_COLOR;
            else
                *pixel++ = CLEAR_COLOR;
        }

        pixel += 64 - RADAR_WIDTH;
        tile += MAP_WIDTH - RADAR_WIDTH;

as to transverse left to right across a sprite, I would have move from 8x8 block of data to another every eight pixels.

Although saying that I could modify the loops slightly to adapt that... :/
_________________
[Blog] [Portfolio]

#12246 - yaustar - Wed Nov 05, 2003 3:39 pm

Code:

if(((b >> 1)<< 1) == b)
                {even_odd = false;}
                else
                {even_odd = true;}
check if it is odd or even
Code:
               
                var_newX = (b >> 3);//divide by eight
                var_newY = (a >> 3);//divide by eight
check which 8x8 block that the pixel is in
Code:

                var_diffX = b - (var_newX << 3);//obtain the reminder
                var_diffY = a - (var_newY << 3);//obtain the reminder
get the reminader which I use to access the right bit in a 8x8 block
Code:

                if(even_odd==false)
                {
                    radar[(var_newY << 8)+(var_diffY << 2) +
                        (var_newX << 5) + (var_diffX >> 1)] |= 128;
                }
                else
                {
                    radar[(var_newY * 256)+(var_diffY * 4) +
                        (var_newX * 32) + (var_diffX / 2)] |= 32768;
                }
edit the pixel colour

var_newY * 256 (each 8*8pixel block has 32 data in it. A row of 8*8 blocks in a 64x64 sprite will therefore have 32*8 = 256 data)

var_diffY * 4 (each line in a 8*8 has 4 data)

var_newX * 32 (to hop to the next horizontal block of 8*8)

var_diffX / 2 (to access the pixel since each data represents teo pixels (256 color mode))[/code]

At the moment, the code stands as:
Code:
void func_drawRadar(s16 var_x, s16 var_y)
{
    register u8 var_newX, var_newY, var_diffY, var_diffX, a, b;//temp variables
    s16 var_curX, var_curY;//temp variables
   
    var_curX = ((var_x + 120) >> 3) - 18;
    var_curY = (((var_y + 80) >> 3) - 18) << 6;
   
    register const u16* col_tile = &Tank_Out_Collision[var_snakeHeight][var_curX + var_curY];
       
    for(a = 0; a < 39; a++)//y
    {
        for(b = 0; b < 39; b++)//x
        {
            if((*col_tile+b & 16)  == 16)
            {
                var_newX = (b >> 3);//divide by eight
                var_newY = (a >> 3);//divide by eight
                var_diffX = b - (var_newX << 3);//obtain the reminder
                var_diffY = a - (var_newY << 3);//obtain the reminder
               
                radar[(var_newY << 8)+(var_diffY << 2) +
                    (var_newX << 5) + (var_diffX >> 1)] |= b & 1 ? 32768 : 128;;
            }
        }
        col_tile+=64;
    }

    radar[585] |= 129;
}

but it comes out very wrong :(
_________________
[Blog] [Portfolio]

#12254 - yaustar - Wed Nov 05, 2003 7:40 pm

I have just tried something as simple as var++ and even then it is still as slow as hell (considering the stage that we are in the game). I just need a new idea to implement it.

Thanks for the help guys :)
_________________
[Blog] [Portfolio]

#12257 - sajiimori - Wed Nov 05, 2003 8:35 pm

Are you saying this runs slow?
Code:

for(a = 0; a < 39; a++)
        for(b = 0; b < 39; b++)
            var++;

You might be doing something else wrong then. A loop like that should run in way less than 1/60 of a second. Are you trying to draw the radar and DMA it all within one vblank or something?

BTW, I understand your math now...I forgot that you wanted to DMA the radar straight into character RAM. Just for fun, I'm still working on a fast (and correct) version.

#12258 - yaustar - Wed Nov 05, 2003 8:36 pm

Quote:

Are you saying this runs slow?
Code:

for(a = 0; a < 39; a++)
for(b = 0; b < 39; b++)
var++;

You might be doing something else wrong then. A loop like that should run in way less than 1/60 of a second. Are you trying to draw the radar and DMA it all within one vblank or something?

yeah, how did you guess....:/

If you would like, I can send you the entire code to muck around with :)
_________________
[Blog] [Portfolio]

#12260 - sajiimori - Wed Nov 05, 2003 9:36 pm

Quote:

yeah, how did you guess....:/

Oh, you can't do that. ;-) You should only do fast things during vblank, such as DMA copies.

Try organizing your program like this:
Code:

main()
{
    init();

    while(1)
    {
        // these things happen during vdraw

        handle_input();
        update_world();
        draw_radar();
        draw_other_graphics();

        wait_for_vsync();

        // these things happen during vblank

        copy_oam();
        copy_radar();
    }
}

Then try using an algorithm like this to draw the radar:
Code:

#define CHAR_PITCH 8
#define CHAR_HEIGHT 8
#define CHAR_WIDTH 8
#define CHAR_SIZE (CHAR_PITCH * CHAR_HEIGHT)
#define RADAR_WIDTH 32
#define RADAR_HEIGHT 32
#define RPITCH (CHAR_SIZE * (RADAR_WIDTH/8))
#define MAP_WIDTH 64

u8 radar[RADAR_WIDTH*RADAR_HEIGHT];

void draw_radar(int map_x, map_y)
{
    u16* tile_block = &Tank_Out_Collision[var_snakeHeight]
                [
                    ((map_x + 120) >> 3) - RADAR_WIDTH/2 +
                    (((map_y + 80) >> 3) - RADAR_HEIGHT/2) * MAP_WIDTH
                ];

    for(u8* row = radar, u8* radar_end = radar + RADAR_HEIGHT*RADAR_WIDTH;
        row < radar_end; row += RPITCH)
    {
        for(u8* chr = row, u8* row_end = row + RPITCH;
            chr < row_end; chr += CHAR_SIZE)
        {
            u16* tile_row = tile_block;

            for(u8* strip = chr, u8* chr_end = chr + CHAR_SIZE;
                strip < chr_end; strip += CHAR_PITCH)
            {
                register u16* tile = tile_row;

                for(register u8* pixel = strip,
                    register u8* strip_end = strip + CHAR_PITCH;
                    pixel < strip_end; ++pixel)
                {
                    if(*tile++ & SOLID_BIT)
                        *pixel = SOLID_COLOR;
                    else
                        *pixel = CLEAR_COLOR;
                }

                tile_row += MAP_WIDTH;
            }

            tile_block += CHAR_SIZE;
        }

        tile_block += (MAP_WIDTH - RADAR_WIDTH) * CHAR_HEIGHT;
    }
}

The reason this code is fast is because the innermost loop is very simple.

The general idea here is to simultaneously "step through" the radar and collision map using pointers. That way, you don't have to recalculate array indexes for each pixel.

The loops are organized around walking through the radar sprite: first through rows of characters, then characters, then rows of pixels, and finally individual pixels.

The tile pointers play an auxiliary role, being created and advanced through the collision map within the bodies of the loops. "tile_block" points to the top left of a block of 8x8 map tiles (corresponding to a sprite character), "tile_row" is the beginning of a row in such a block, and "tile" points to an individual tile.

Only the variables used in the innermost loop should be declared "register". Inlining the function is unnecessary. The 'radar' array should be u8[]. I changed the radar size to 32x32, but it should work with other sizes.

This code is untested. It may not even compile. One problem with it is that it will gladly run off the edge of the collision map.

#12261 - yaustar - Wed Nov 05, 2003 9:42 pm

well this is my while loop as it stands:
Code:

while(1)
    {
        func_getInput(var_x, var_y, str_snakeSprite.spr_no, var_anim_delay, str_guardSprite);
       
        REG_DM3SAD = (u32)radarBlank;//source of data
        REG_DM3DAD = (u32)radar;//target for data
        REG_DM3CNT = (2048 | DMA_16NOW);//start copying NOW, in 16bit values
       
        func_drawRadar(var_x, var_y);
       
        /*for(var_loop = 1536; var_loop < 3584; var_loop++) //96   = radar
           {OAMData[var_loop] = radar[var_loop-1536];}*/
        REG_DM3SAD = (u32)radar;//source of data
        REG_DM3DAD = (u32)&OAMData[1536];//target for data
        REG_DM3CNT = (2048 | DMA_16NOW);//start copying NOW, in 16bit values
       
        //update backgrounds values
        str_bgLayer[0].x_scroll = str_bgLayer[1].x_scroll = str_bgLayer[2].x_scroll = var_x;
        str_bgLayer[0].y_scroll = str_bgLayer[1].y_scroll = str_bgLayer[2].y_scroll = var_y;       
       
        func_guardMove(str_guardSprite);

        sprites[0].attribute2 = str_snakeSprite.spr_no | PRIORITY(func_checkOverlay(0, var_x, var_y));//set priority to 2;

       func_moveSprite(&sprites[1], str_guardSprite.x, str_guardSprite.y);

/**/    UpdateBackground(&str_bgLayer[0]);
/**/    UpdateBackground(&str_bgLayer[1]);
/**/    UpdateBackground(&str_bgLayer[2]);

        WAIT_VSYNC();

        //func_copyOAM();         //Copies sprite array into OAM.
        REG_DM3SAD = (u32)sprites;
        REG_DM3DAD = (u32)OAM;
        REG_DM3CNT = 512 | DMA_16NOW;
    }


One thing that my compilar did not like in your prevouis code was :
Code:

 u16* tile_block = &Tank_Out_Collision[var_snakeHeight]
                [
                    ((map_x + 120) >> 3) - RADAR_WIDTH/2 +
                    (((map_y + 80) >> 3) - RADAR_HEIGHT/2) * MAP_WIDTH
                ];

it gave the error that it cannot convert u16* to u16. Something about the map being a 'const' i think.

I try out the code when I get a chance to fully study it. Thanks :)
_________________
[Blog] [Portfolio]

#12262 - sajiimori - Wed Nov 05, 2003 9:57 pm

Oh, I thought you were doing all that work from the vblank interrupt or something. Anyway, it still seems to me that something is wrong if you can't run a simple loop within one frame. Another thing you'll probably want to do is put the function in IWRAM.

So I guess radarBlank has some background picture? Is that really necessary?

If the error you got was about converting u16* to u16, then it wasn't about constness. That line looks fine to me...


Last edited by sajiimori on Wed Nov 05, 2003 10:00 pm; edited 1 time in total

#12263 - yaustar - Wed Nov 05, 2003 10:00 pm

It is too bascically, 'erase' the entire sprite to transperent. So as a result I draw only the visable walls etc to the radar since DMA is faster then for loops.

IWRAM, heard of it, dont have a clue what the heck it is :/

edit: Internal work ram :p
_________________
[Blog] [Portfolio]

#12264 - sajiimori - Wed Nov 05, 2003 10:08 pm

You don't need a whole array of zeros just to clear a block of memory. The DMA registers can be set up to copy a single value across a range.

Anyway, yes, DMA is faster than loops, but you are looping over the whole radar anyway. There is no avoiding the loop, so you may as well ditch the useless DMA transfer.

You can read all about the gba's memory architecture in the various tutorials and hardware specs out there.

If the function is still not fast enough, try updating only a portion of the radar each frame. For a 32x32 radar, it would still only take 16 frames to update the whole thing one character at a time.

#12265 - yaustar - Wed Nov 05, 2003 10:12 pm

How do 'use' the function to work with the IWRAM and what ram is it using at the moment (by default)?
_________________
[Blog] [Portfolio]

#12266 - sajiimori - Wed Nov 05, 2003 11:04 pm

Code is normally in ROM. This is one way to put a function in IWRAM:
Code:

void __attribute__((section(".iwram"))) function() { }

#12267 - sajiimori - Wed Nov 05, 2003 11:08 pm

BTW, your idea of clearing out the radar before rendering it might be faster if most areas of the map are not solid. If that's the case, fix your DMA so it copies a single value instead of a whole block of memory. If you decide to render a character at a time, you'll obviously only want to clear the character you're about to draw.

#12269 - yaustar - Wed Nov 05, 2003 11:12 pm

would I be able to do this
Code:
#define IN_INRAM __attribute__ ((section (".iwram")))

and then do this:
Code:
void IN_INRAM function() {   }

_________________
[Blog] [Portfolio]

#12270 - sajiimori - Wed Nov 05, 2003 11:17 pm

Also, I just realized that the code I posted earlier won't work if the radar is not the same size as the sprite that it will be copied to. If you do it this way you should be able to make the radar any size less than 64x64:
Code:

#define RADAR_WIDTH (whatever)
#define RADAR_HEIGHT (whatever)
#define RADAR_SPRITE_WIDTH 64
#define RADAR_SPRITE_HEIGHT 64
#define RPITCH (CHAR_SIZE * (RADAR_SPRITE_WIDTH/8))

u8 radar[RADAR_SPRITE_WIDTH*RADAR_HEIGHT];

#12272 - sajiimori - Wed Nov 05, 2003 11:19 pm

Quote:

would I be able to do this

Why don't ya try it instead of asking me?? ^_^

#12273 - yaustar - Wed Nov 05, 2003 11:28 pm

:p
heh...signs of a newbie (damn I got to get rid of these)
_________________
[Blog] [Portfolio]

#12314 - yaustar - Fri Nov 07, 2003 10:49 pm

okay, I am getting these errors:
Code:

register u16* var_tile = &Tank_Out_Collision[var_snakeHeight]
        [
            ((var_x + 120) >> 3) - 20 +
            (((var_y + 80) >> 3) - 20) * 64
        ];//pointer to the collision tile

invalid conversion from `const u16*' to `u16*'

register u8* var_pixel = &radar[0];//pointer to radar sprite

165 C:\DOCUME~1\YAUSTA~1\MYDOCU~1\MYPROJ~1\Code\BG_DMA\main.cpp
cannot convert `u16*' to `u8*' in initialization

_________________
[Blog] [Portfolio]

#12315 - tepples - Fri Nov 07, 2003 11:24 pm

C++ is a bit pickier with pointers than C is. You may have to compare your source code against the header files you use. Some of the gba.h files floating around the Internet are not const-correct, not void *-correct, or neither.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#12317 - yaustar - Fri Nov 07, 2003 11:28 pm

I have the one by Eloist (gba.h), what am I looking for in the gba files?
_________________
[Blog] [Portfolio]

#12322 - sajiimori - Sat Nov 08, 2003 12:25 am

Fix the first error by declaring var_tile const. Do the same for all other tile pointers as well.

The second error is because I changed your radar array to u8[], but I guess you didn't change it accordingly. Is there some reason you wanted it to be u16[]?

Do you have a book on C++? You'll want some sort of reference that will help you solve these sorts of problems on your own.

#12455 - yaustar - Fri Nov 14, 2003 3:46 pm

Here is the new version of the code:
Code:

void func_drawRadar(s16 var_x, s16 var_y)
{
    register u16 var_newX, var_newY, var_diffY, var_diffX;//temp variables
    register int var_curSum;
       
    //var_curX = ((var_x + 120) >> 3) - 18;
    var_x += 120;
    var_x = var_x >> 3;
    var_x -= 18;
   
    //var_curY = (((var_y + 80) >> 3) - 18) << 6;
    var_y +=80;
    var_y = var_y >> 3;
    var_y -= 18;
    var_y = var_y << 6;
   
    var_curSum = var_x + var_y;
   
    for(register u8 y = 0; y < 39; y++)//y
    {
        var_newY = y << 5;
        var_newY &= ~255;
        var_diffY = y & 7;//obtain the reminder
        var_diffY = var_diffY << 2;
        var_newY += var_diffY;
       
        for(register u8 x = 0; x < 39; x++)//x
        {
            if((Tank_Out_Collision[var_snakeHeight][var_curSum + x] & 16)  == 16)
            {
                var_newX = x << 2;
                var_newX &= ~31;
                var_diffX = x & 7;//obtain the reminder
                var_diffX = var_diffX >> 1;
                var_newX += var_diffX;
               
                radar[var_newY + var_newX] |= x & 1 ? 32768 : 128;           
            }
        }
        var_curSum+=64;
    }
    radar[585] |= 129;
}

It is not put in IWRAM yet because I want to check out the speed
_________________
[Blog] [Portfolio]

#12458 - sajiimori - Fri Nov 14, 2003 8:07 pm

I suspect the code will be roughly the same speed as it was before. You did not incorporate the most important optimization: avoiding the recalculation of array indexes on every pixel. All the other optimizations are insignificant in comparison.

#12459 - yaustar - Fri Nov 14, 2003 9:27 pm

yeah, I would kinda of figured that :/
I figure I need to get rid of this code somewhere
Code:

                var_newX = x << 2;
                var_newX &= ~31;
                var_diffX = x & 7;//obtain the reminder
                var_diffX = var_diffX >> 1;
                var_newX += var_diffX;


at the moment I am optimising this code a bit at a time so I dont run into any unexpected errors.
Another for loop would be needed to help transverse across the sprite.
Hopefully shouldn't be too hard ;)
_________________
[Blog] [Portfolio]

#12465 - sajiimori - Sat Nov 15, 2003 3:24 am

Your overall approach is inefficient, and you won't be able to make it efficient by making a series of small changes.

Let me be frank here: Do you know how to use pointer arithmetic?

#12471 - yaustar - Sat Nov 15, 2003 2:36 pm

Not very well, but every time I used them in this code it kept coming up with errors that made no sense.

At the moment, the way I understand it is if I use a u8 pointer to transverse through a u16 array of 2048 length, then I would accessing 8 bits at a time and therefore be going through 4096 times to complete transverse through it.

If I am wrong them please correct me..

edit: forget that, it does not relate to what we are doing here. I giving the pointer stuff another shot at tne moment now that I have a clear head and a bigger insight of what my radar is doing I *should* be able to pull it off :)
_________________
[Blog] [Portfolio]

#12472 - yaustar - Sat Nov 15, 2003 3:38 pm

Code:

IN_IWRAM void func_drawRadar(s16 var_x, s16 var_y)
//void func_drawRadar(s16 var_x, s16 var_y)
{
    register u16 var_newX, var_newY, var_diffY, var_diffX;//temp variables
    register int var_curSum;
    register u8 var_collInc = 0;
    //register const u16* pointerRadar = &radar[0];
       
    //var_curX = ((var_x + 120) >> 3) - 18;
    var_x += 120;
    var_x = var_x >> 3;
    var_x -= 18;
   
    //var_curY = (((var_y + 80) >> 3) - 18) << 6;
    var_y +=80;
    var_y = var_y >> 3;
    var_y -= 18;
    var_y = var_y << 6;
   
    var_curSum = var_x + var_y;
   
    for(register u8 y = 0; y < 39; y++)//y
    {
        var_newY = y << 6;
        var_newY &= ~511;
        var_diffY = y & 7;//obtain the reminder
        var_diffY <<= 3;
        var_newY += var_diffY;
       
        for(register u8 xOuter = 0; xOuter < 320; xOuter+=64)//x block
        {
            for(register u8 xInner = 0; xInner < 8; xInner++)//x reminder
            {       
                if((Tank_Out_Collision[var_snakeHeight][var_curSum + var_collInc] & 16)  == 16)
                {
                    var_newX = xInner + xOuter;
                    var_newX += var_newY;
                   
                    radar[var_newX] = 128;
                }
                var_collInc++;
            }
        }
        var_curSum+=64;
        var_collInc = 0;
    }
    radar[1170] = 129;
}


slowly getting there, I am beginning to understand how your code works on page two but hit a slight snag, my sprites have disappeared and not sure why :/ I think an infinite loop is happening somewhere

btw the radar is an u8 now

edit: me idiot
u8 = 0-255
for(u8< 320)
_________________
[Blog] [Portfolio]


Last edited by yaustar on Sat Nov 15, 2003 6:02 pm; edited 1 time in total

#12476 - yaustar - Sat Nov 15, 2003 5:59 pm

Code:

    for(register u16 yOuter = 0; yOuter < 2560; yOuter+=512)//y block
    {
        for(register u8 yInner = 0; yInner < 64; yInner+=8)//y reminder
        {
            var_newY = yInner + yOuter;//5*8 = 40 times
            for(register u16 xOuter = 0; xOuter < 320; xOuter+=64)//x block
            {
                for(register u8 xInner = 0; xInner < 8; xInner++)//x reminder
                {       
                    if((Tank_Out_Collision[var_snakeHeight][var_curSum + var_collInc]
                        & 16) == 16)
                    {
                        var_newX = xInner + xOuter;
                        var_newX += var_newY;
                       
                        radar[var_newX] = 128;
                    }
                    var_collInc++;//1600 times
                }
            }
            var_curSum += 64;
            var_collInc = 0;
        }
    }

without using pointers, this is as best as I can get it (it doesn't quite work, not yet anyway) Still a bit jubious how pointers would make this faster though if you care to explain :/

edit: fixed it.
_________________
[Blog] [Portfolio]

#12482 - sajiimori - Sat Nov 15, 2003 9:29 pm

Jubious?? heheh
Quote:

if you care to explain :/

Sure, I'll give it a shot.
Code:

void abc()
{
    for(int i = 0; i < 20; ++i)
    {
        f();

        for(int j = 0; j < 10; ++j)
            g();
    }
}

Q: If f() takes 200 CPU cycles to execute, how many cycles are spent in f() while abc() runs?

A: Since f() is called 20 times, 4000 cycles are spent in f().


Q: If g() takes 30 CPU cycles to execute, how many cycles are spent in g() while abc() runs?

A: Since g() is called 200 times, 6000 cycles are spent in g().


Q: What happens to the execution speed of abc() if you optimize f() by 5 cycles?

A: abc() will execute 100 cycles faster.


Q: What happens to the execution speed of abc() if you optimize g() by 5 cycles?

A: abc() will execute 1000 cycles faster.


To optimize an algorithm, focus on the innermost loop because the most time is spent there. Even if f() takes 200 cycles and g() only takes 30, you should still focus on g().

Here is your innermost loop (I've excluded the 'xInner < 8; xInner++' part of the for loop, just for simplicity):
Code:

    if((Tank_Out_Collision[var_snakeHeight][var_curSum + var_collInc]
       & 16) == 16)
    {
        var_newX = xInner + xOuter;
        var_newX += var_newY;
                       
        radar[var_newX] = 128;
    }
   
    var_collInc++;

Here's a modified version that involves fewer operations (but it does not modify var_newX -- hopefully you didn't need it assigned to):
Code:

    if((Tank_Out_Collision[var_snakeHeight][var_curSum + var_collInc]
       & 16) == 16)
    {
        radar[xInner + xOuter + var_newY] = 128;
    }
   
    var_collInc++;

Let's take a look at what kind of CPU operations will have to be performed each time this code is run.
Code:

load           Tank_Out_Collision
load           var_snakeHeight
add            Tank_Out_Collision + var_snakeHeight
dereference    *(Tank_Out_Collision + var_snakeHeight)
load           var_curSum
load           var_collInc
add            var_curSum + var_collInc
add            (*(Tank_Out_Collision + var_snakeHeight)) + var_curSum + var_collInc
dereference    *((*(Tank_Out_Collision + var_snakeHeight)) + var_curSum + var_collInc)
and            & 16
compare        == 16
cond branch    if()

** The following is only executed for solid pixels **

load           radar
load           xInner
load           xOuter
add            xInner + xOuter
load           var_newY
add            xInner + xOuter + var_newY
add            radar + xInner + xOuter + var_newY
load           128
store          *(radar + xInner + xOuter + var_newY) = 128

** End conditional code **

load           collInc
add            1
store          collInc + 1

Here is my innermost loop (with assigning the clear color removed, since you cleared the radar beforehand; I also replaced the #defined constants with your values; and like the earlier example, I excluded the 'pixel < strip_end; ++pixel' part of the for loop):
Code:

    if(*tile++ & 16)
        *pixel = 128;

Here's roughly what's involved:
Code:

load           tile
dereference    *tile
add            1
store          *tile + 1
and            & 16
cond branch    if()

** The following is only executed for solid pixels **

load           pixel
load           128
store          *pixel = 128

** End conditional code **

Alright, that's with no register variables. For register variables, you can remove the 'load' and 'store' operations. By putting xInner and xOuter in registers, yours looks like this (the top part is unchanged):
Code:

load           Tank_Out_Collision
load           var_snakeHeight
add            Tank_Out_Collision + var_snakeHeight
dereference    *(Tank_Out_Collision + var_snakeHeight)
load           var_curSum
load           var_collInc
add            var_curSum + var_collInc
add            (*(Tank_Out_Collision + var_snakeHeight)) + var_curSum + var_collInc
dereference    *((*(Tank_Out_Collision + var_snakeHeight)) + var_curSum + var_collInc)
and            & 16
compare        == 16
cond branch    if()

** The following is only executed for solid pixels **

load           radar
add            xInner + xOuter
load           var_newY
add            xInner + xOuter + var_newY
add            radar + xInner + xOuter + var_newY
load           128
store          *(radar + xInner + xOuter + var_newY) = 128

** End conditional code **

load           collInc
add            1
store          collInc + 1

Here is mine with 'tile' and 'pixel' in registers:
Code:

dereference    *tile
add            1
and            & 16
cond branch    if()

** The following is only executed for solid pixels **

load           128
store          *pixel = 128

** End conditional code **

Note that you cannot expect more than a few variables to be stored in registers.

So hopefully that explains why it's faster: it makes the inner loop simpler when you use pointers to keep track of where you are in the arrays.

Edit: noticed that 'pixel' doesn't have to be dereferenced because the value is never read.

#12483 - yaustar - Sat Nov 15, 2003 9:47 pm

Ah, I see now. (why wasnt I taught this at uni :/ )
Pointers tend to 'tidy' the code up a bit and therefore saves the need to load all the variables like snake height etc.

Now that I have the code in a better state then at the start, pointers should be an easy thing to incoprate (yeah right (-.-;) )

Thanks for the help :)
_________________
[Blog] [Portfolio]

#12484 - yaustar - Sat Nov 15, 2003 11:33 pm

Slight update:
Code:

void func_drawRadar(s16 var_x, s16 var_y)
{
    register u8* var_radarTile;//temp variables
    register const u16* var_curSum;
    //register const u16* pointerRadar = &radar[0];
       
    //var_curX = ((var_x + 120) >> 3) - 18;
    var_x += 120;
    var_x = var_x >> 3;
    var_x -= 18;
   
    //var_curY = (((var_y + 80) >> 3) - 18) << 6;
    var_y +=80;
    var_y = var_y >> 3;
    var_y -= 18;
    var_y = var_y << 6;
   
    var_curSum = &Tank_Out_Collision[var_snakeHeight][var_x + var_y];
   
    for(u16 yOuter = 0; yOuter < 2560; yOuter+=512)//y block, loops 5 times
    {//gets called 5 times
        for(u8 yInner = 0; yInner < 64; yInner+=8)//y reminder, loops 8 tiems
        {//gets called 40 times
            var_radarTile = yInner + yOuter;//must be reset here
            for(register u16 xOuter = 0; xOuter < 320; xOuter+=64)//x block, loops 5 times
            {//gets called 200 time
                for(register u8 xInner = 0; xInner < 8; xInner++)//x reminder. loops 8 times
                {//gets called 1600 times
                    if(*var_curSum++ & 16)
                    {                 
                        *(var_radarTile + xInner) = 128;
                    }
                }
                var_radarTile += 64;
            }
            var_curSum += 24;//64-40
        }
    }
    radar[1170] = 129;
}


but I get this error
Code:

132 C:\DOCUME~1\YAUSTA~1\MYDOCU~1\MYPROJ~1\Code\BG_DMA\main.cpp
invalid conversion from `int' to `u8*'

with this line
Code:

var_radarTile = yInner + yOuter;//must be reset here

and not sure why
_________________
[Blog] [Portfolio]

#12485 - sajiimori - Sat Nov 15, 2003 11:48 pm

Code:

var_radarTile = yInner + yOuter;

Of course it's an error (int + int != u8*). I guess you were trying to calculate an offset, but an offset from what?
Code:

    //var_curX = ((var_x + 120) >> 3) - 18;
    var_x += 120;
    var_x = var_x >> 3;
    var_x -= 18;
   
    //var_curY = (((var_y + 80) >> 3) - 18) << 6;
    var_y +=80;
    var_y = var_y >> 3;
    var_y -= 18;
    var_y = var_y << 6;

Why do you have single lines of code commented out and then replaced with multiple lines that do the same thing? You created 9 lines of text where 2 would do.

Edit: BTW, don't do little optimizations like 'register' until you are done designing your algorithm. They just make a mess. Save fine-tuning until last.

#12486 - yaustar - Sun Nov 16, 2003 12:12 am

The 'turn 2 LOC' into 8 was an optimisation to lower the number of tempoary variables created during run time (read it somewhere).

Still dont fully understand the error, should I change it to a u16? Once I change it to anything else it starts to hate this line
Code:

var_radarTile = &radar[0];


It is not so much of an offset but more of a restart point y wise.
[/code]
_________________
[Blog] [Portfolio]

#12487 - tepples - Sun Nov 16, 2003 12:18 am

yaustar wrote:
The 'turn 2 LOC' into 8 was an optimisation to lower the number of tempoary variables created during run time (read it somewhere).

In my experience, reassigning repeatedly to a variable (such as using +=) actually slows the code down because it makes GCC blind to other optimizations, especially array access optimizations.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#12489 - sajiimori - Sun Nov 16, 2003 12:29 am

I guess I don't understand why you would think that yOuter and yInner would add up to a valid address.

#12490 - yaustar - Sun Nov 16, 2003 12:39 am

got mixed up with the examples
here is my very clumsy way of dealing with it:
Code:

var_radarTile = &radar[0];
var_radarTile += yInner;
var_radarTile += yOuter;


Now time to tune the loops a bit :p
_________________
[Blog] [Portfolio]

#12491 - yaustar - Sun Nov 16, 2003 12:52 am

What is hopefully the last major iteration of the function (*panting*)
Code:

void func_drawRadar(s16 var_x, s16 var_y)
{
    register u8* var_radarTile;//temp variables
    register const u16* var_curSum;
       
    var_x = ((var_x + 120) >> 3) - 18;
    /*var_x += 120;
    var_x >>= 3;
    var_x -= 18;*/
   
    var_y = (((var_y + 80) >> 3) - 18) << 6;
    /*var_y +=80;
    var_y >>= 3;
    var_y -= 18;
    var_y <<= 6;*/
   
    var_curSum = &Tank_Out_Collision[var_snakeHeight][var_x + var_y];
    var_radarTile = &radar[0];
   
    for(u16 yOuter = 0; yOuter < 2560; yOuter+=512)//y block, loops 5 times
    {                                               //gets called 5 times
        var_radarTile = &radar[0];
        var_radarTile += yOuter;
        for(u8 yInner = 0; yInner < 64; yInner+=8)//y reminder, loops 8 tiems
        {                                           //gets called 40 times
            for(u16 xOuter = 0; xOuter < 320; xOuter+=64)//x block, loops 5 times
            {                                       //gets called 200 time
                for(register u8 xInner = 0; xInner < 8; xInner++)//x reminder. loops 8 times
                {                                   //gets called 1600 times
                    if(*var_curSum++ & 16)
                    {                 
                        *(var_radarTile + xInner) = 128;
                    }
                }
                var_radarTile += 64;
            }
            var_curSum += 24;//64-40
            var_radarTile -= 312;//(5*64)-8
        }
    }
    radar[1170] = 129;
}


Please tell it's okay......
_________________
[Blog] [Portfolio]

#12492 - sajiimori - Sun Nov 16, 2003 1:21 am

If it works, that's much better than what you had before (though I still don't know why you put 'var' on many of your variables...so ugly!).

Looks like you have the right idea as far as which variables to make 'register'.

The last significant optimization I see is to get rid of that addition in the inner loop. You can do that by changing the loop iterator to a pointer. That is, since you're just calculating this:

radarTile + 0
radarTile + 1
radarTile + 2
radarTile + 3...

...you may as well do it this way:

u8* tile = radarTile
tile++
tile++
tile++...

#12493 - yaustar - Sun Nov 16, 2003 1:38 am

the var_ prefixes is part of a code standard that our small group has agreed to. We find it makes it easier to what is what.

I think you are talking about the inner loop addition which is this code
Code:

var_radarTile + xInner

Since it only gets called 170 times max per function call due to the if statement , doing an increment out side the if statement, it does '++' 16000 times per function call.

If I am wrong please tell me.

Btw thanks for all your help sajiimori, in this topic ihave learnt 'streamlining algorithms (to an extent)', how pointers can be faster with arrays and a bit of ponter arithmatic :)

You will be added to the credits by name - 'sajiimori'
_________________
[Blog] [Portfolio]

#12506 - sajiimori - Sun Nov 16, 2003 6:34 am

Quote:

Since it only gets called 170 times max per function call due to the if statement , doing an increment out side the if statement, it does '++' 16000 times per function call.

I think I understand your point. (I don't know where you got the number 170 and I think you meant 1600 instead of 16000, but besides that...)

It's true that the addition is only done for solid pixels on the radar, but you are still incrementing xInner 1600 times. Instead of stepping xInner from 0 to 7, why not step a u8* from the start of the strip to the end of it? I mean, if you have to increment *something* 1600 times, you may as well increment the most useful thing. ;-)
Code:

for(register u8* pixel = var_radarTile, register u8* strip_end = pixel + 8;
    pixel < strip_end; ++pixel)
{
    if(*var_curSum++ & 16)
        *pixel = 128;
}

Now there are the same number of increments, but the addition is gone. Any further optimizations would be minor. You could unroll the loop, or rewrite in assembly. Personally, I'd say if it's too slow as it is, just compromize and draw only a portion of the radar each frame. I would be impressed if you get the radar to completely refresh 60 times a second on a 16MHz processor!

Anyway, glad to be of assistance. :-) Send me some screenshots (or a ROM) when you get it working.

#12509 - sajiimori - Sun Nov 16, 2003 9:44 am

Ok, I can't resist mentioning these other things (which aren't serious):

The second line here is useless:
Code:

...
    var_curSum = &Tank_Out_Collision[var_snakeHeight][var_x + var_y];
    var_radarTile = &radar[0];
...

The loop variable here is useless:
Code:

    for(u16 yOuter = 0; yOuter < 2560; yOuter+=512)
    {
        var_radarTile = &radar[0];
        var_radarTile += yOuter;

This is better (note that 'radar + 2560' is a compile-time constant, so no addition is required at runtime):
Code:

    for(var_radarTile = radar;
        var_radarTile < radar + 2560; var_radarTile += 512)
    {

These loops variables aren't used anywhere, and are simple counters:
Code:

        for(u8 yInner = 0; yInner < 64; yInner+=8)
        {
            for(u16 xOuter = 0; xOuter < 320; xOuter+=64)
            {

It's better to count down to zero by decrementing:
Code:

        for(u8 yInner = 8; yInner; --yInner)
        {
            for(u8 xOuter = 5; xOuter; --xOuter)
            {

Now for nit-picking on style:
Code:

var_curSum += 24;//64-40

If you mean 64-40, then just put 64-40. The addition will be performed at compile-time, not run-time.
Code:

    register u8* var_radarTile;//temp variables

Gee, I thought they were permanent. ;-) Comments should only be used to offer useful information.

Most important style note:

Your code is full of what they call 'magic numbers': values whose meanings are not apparent. For example:
Code:

yOuter < 2560
var_radarTile -= 312;
radar[1170] = 129;

Good code should use named constants to build such values. The only numbers that should appear in your code are small numbers with obvious meanings. All other values should be named.

I did use a couple magic numbers in my earlier posts: 120 and 80 for half the screen width and height respectively. They really should have been SCREEN_WIDTH/2 and SCREEN_HEIGHT/2.

There are 2 main reasons for using named constants instead of magic numbers:

1) If you want to change a value, you only have to change where it is defined. If the value is used in multiple places, you don't have to try to track them all down.

2) It's much easier to understand. 't = 10' could mean anything (why 10 instead of 15 or 1000?), but 't = SECONDS_UNTIL_LAUNCH' is crystal clear.

#12518 - yaustar - Sun Nov 16, 2003 4:10 pm

Good news, I have the radar going at full speed (woot) by drawing half of it per cycle of the while loop. The side effect is that it looks slightly disjointed when moving but that is going to a small sacifice of speed :)

PM you email address and I send you the bin file.
_________________
[Blog] [Portfolio]