gbadev.org forum archive

Hey all.

I've been working on a 3d engine for the GBA for a couple of years (on and off), and have decided that it is time to give it a major overhaul so that it can compete with more mature engines.
While I should worry the most about structuring my faces and vertices in the most optimal way, I am still playing around with my buggy linear textured polyfiller (which has to be rewritten fully because it sucks and is linear).
God damn those massive overheads, there are too many variables that needs to be handled gracefully.

But, what I want to ask:

Here is my current inner loop for the polyfiller:

Code:

while(toplength--)
{
   int startx = xleft>>8;
   int len = (xright>>8) - startx;
   if(y >= 0 && y < 160 && xright > 0 && startx < 240) {
      unsigned uv = v | (u << 16);
      if(startx < 0) {
         uv -= startx*duvx;
         len += startx;
         startx = 0;
      }
      if(startx+len > 240) len = 240 - startx;
      if(len > 0 ) {
         if(startx & 1) {
            //Draw first uneven pixel
         }
         if(len > 0 ) {
            //Fill pixels
         }
      }
   }
   y++;
   if(y > 159) return;

   xleft += dxleft;
   xright += dxright;
   u += dug;
   v += dvg;
}

I was thinking about doing this:

Code:

while(toplength--) {
   int startx = xleft>>8;
   int len = (xright>>8) - startx;
   if(len < 1 || y < 0 || y > 159 || xright < 0 || startx > 240)
      goto nodraw;

   unsigned uv = v | (u << 16);

   if(startx < 0) {
      uv -= startx*duvx;
      len += startx;
      startx = 0;
   }

   if(startx+len > 240)
      len = 240 - startx;

   if(len < 1 )
      goto nodraw;

   if(startx & 1) {
      //Draw first uneven pixel
   }

   if(len > 0 ) {
      //Fill Scanline
   }
nodraw:

   y++;
   if(y > 159) return;

   xleft += dxleft;
   xright += dxright;
   u += dug;
   v += dvg;
}

Would this create better or worse code? I am afraid that it could add some complexity to how the program would handle the stack.

Cheers.

Are you sure it pays off to optimize for thin polygons? :)

Using 'goto' often ties the optimizer's hands.

Compare the assembly output -- anything less is like working blindfolded.

Well put. I guess the overhead is way more important.

I think you said something about storing x's, y's, u's and v's as local variables once, or am I wrong? It seems to make sense to me, but it is just such a huge mess to have 16-20 variables (x, y, u, v for top/middle/bottom/left/right) in the air while calculating delta values using dotproducts.
Should I be calculating 5 dot products pr. face? (area, u/v horizontal and vertical growths). It takes up shitloads of CPU when I do it like this:

Code:

int d_u_t = p->u[top] - p->u[middle];
int d_v_t = p->v[top] - p->v[middle];
int d_u_b = p->u[bottom] - p->u[middle];
int d_v_b = p->v[bottom] - p->v[middle];

int dux = (rcparea * ((ytop - ymiddle) * d_u_b - d_u_t * (ybottom - ymiddle)))>>8;
int dvx = (rcparea * ((ytop - ymiddle) * d_v_b - d_v_t * (ybottom - ymiddle)))>>8;
int duy = (rcparea * (d_u_t * (xbottom - xmiddle) - d_u_b * (xtop - xmiddle)))>>8;
int dvy = (rcparea * (d_v_t * (xbottom - xmiddle) - d_v_b * (xtop - xmiddle)))>>8;

My rule: If you have to worry about it, just write the assembly yourself.

But to answer your specific question, it depends on the compiler. Benchmark it, or just look at the assembly output of each version.
I would just stick with nesting by default since it theoretically shouldn't make a difference, and the gotos don't simplify the code at all.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

I guess I'll go with that and concentrate on building a better program structure using structs for faces and vertices without arrays and linked lists for faces so that I can easely remove faces from the projection list using backface culling before sorting :)

LG.

I usually do some calculations to find a theoretical max performance, and then implement something that's easy to work with. If the performance is way off from the theoretical max, THEN I start worrying about optimizing.

You can accomplish the same thing as your 2 gotos with a little brace nesting - and teh compiler will understand that better.

But as speed is king here, you'd be better off with assembler so you can interleave your pixel drawing with your loop management overhead - after you are sure you algorithm is as fast as it can be.

kusma wrote:

Are you sure it pays off to optimize for thin polygons? :)

+1.

There's no code shown for "// FILL PIXELS", but I'm guessing that you're already unrolling your scan-line (horizontal) render loop? This is the most critical part of the poly rendering algorithm, so optimizing in Y will only give marginal improvements (with the exception of the above mentioned thin-in-X polys).

And if you want slightly better performance beyond unrolling, set some registers with startx, len, uv, and duvx, then hand-tool that most inner loop of the scan-line renderer with a few lines of assembly, preferrably unrolled as well.
_________________
"Artificial Intelligence is no match for natural stupidity."

DekuTree64 wrote:

My rule: If you have to worry about it, just write the assembly yourself.

Frankly, I think this is the correct answer.

-Brendan

gbadev.org forum archive

C/C++ > goto vs. nests in timecritical polyfilling code.

#150304 - Lord Graga - Fri Feb 01, 2008 7:21 pm

#150305 - kusma - Fri Feb 01, 2008 7:42 pm

#150307 - sajiimori - Fri Feb 01, 2008 7:48 pm

#150308 - Lord Graga - Fri Feb 01, 2008 7:53 pm

#150309 - DekuTree64 - Fri Feb 01, 2008 7:57 pm

#150310 - Lord Graga - Fri Feb 01, 2008 8:05 pm

#150311 - kusma - Fri Feb 01, 2008 8:17 pm

#150327 - Miked0801 - Sat Feb 02, 2008 3:12 am

#153578 - SevenString - Wed Apr 02, 2008 7:34 am

#153784 - sgeos - Sun Apr 06, 2008 7:24 am