gbadev.org forum archive

Hi All, I've just released the source code to my GBA 3D engine.

http://www.theteahouse.com.au/gba/index.html

The source contains some hard to find fixed point (24:8) matrix generation, high speed frustum clipping, DDA texture mapping and 15bit dynamic lighting. I have no idea how fast this is on a real GBA, but the code will be very usable on future portable devices.

The code is compact and easy to read. Meshs are merged into the 3D world without z-buffers or z-sorting. Non-visible polygons are culled at run-time using a map ray casting algorithm which also provides a valid back-to-front polygon order. Lighting & shadows are fully dynamic. Ambient and overbright light effects are achieved using pre-calculated LUT's, which also handle the 8bit to 15bit pixel conversion. The DDA rendering contains no divides thanks to a pre-caculated 16:16 reciprocal LUT.

Enjoy!
Derek Evans

Impressive.

I did some 3D stuff some time ago, mostly focused on character rendering and animation. My engine support Mode 3,4 and 5. Rotated mode 5 is obviously the fastest.

You can have a look at it at http://heliscar.com/greg/ (full source code included).

I see that all your engine is written in C. I think I would be able to optimize it a bit by using assembly, if you don't mind.

Maybe we could "work" together to make then a full-featured 3D engine :)
_________________
GBA,GC,NGPC,GP32,FPGA,DS stuff at http://torlus.com/

Wow,

hey that's looks pretty good.

But just as you say, it's a little slow,
even on this pc with a p4 2.4 ghz, i thought i would be fluent.

But it's a nice engine, can you compile it to multiboot?
And send me a copy, that way i can test it on hardware
and see the speed. (or is iwram used?).
_________________
It seems this wasn't lost after all.

Thanks all. I've received a number of emails and feedback which is fantasic for only 2 days, so I thank everyone.

The speed issue is being worked on at the moment. I've changed the resolution to 120x80 which allowed me to removed the per span line clipping. The main engine has also been cleaned up and Im working on making the floor/ceiling high sub-height variable, so we can have stairs and yep, slopes are also comming. They are pretty simple.

Currently the engine feels geometry limited, so Im planning on rendering far walls using single polygons. Currently each wall cell is 4 polygons. Also, near walls will be sub-divided to increase texturing quality. A-la PS1.

I will upload the changes tommorrow. Today is kinda busy.

Hopefully all will be running silky smooth soon.

BTW: Im using VisualBoyAdvanced on a AMD1800. Frame skip=2, Rendering=DirectDraw, Throttle=None, Vsync=Off, Priority=High

The current version flys, but im aiming on flattening the average frame rate a little more.

Regards
Derek Evans

it simply looks amazing! It runs very smooth for real 3d and looks very nice. Do you do per pixel lightning? I wouldn't do per pix light on an gba.

How do you draw your polys? Do you use bresenham implementation to draw them?

Do you calculate the reciprocal LUT for all 32 or 16 bit integers? Would be an pretty huge LUT though... How did you do that?

Torlus wrote:

Impressive.

I did some 3D stuff some time ago, mostly focused on character rendering and animation. My engine support Mode 3,4 and 5. Rotated mode 5 is obviously the fastest.

You can have a look at it at http://heliscar.com/greg/ (full source code included).

I see that all your engine is written in C. I think I would be able to optimize it a bit by using assembly, if you don't mind.

Maybe we could "work" together to make then a full-featured 3D engine :)

I've lurked at your engine.. I was surprised at how good it was, although I use GMAX not 3ds for making models (think NWN). What tools do you have for transforming 3d objects into something your engine can use? GMAX can be used for making NWN people or at least editing them, however there are some problems for anything else (smirk).

I think what would be idealic is to have an 'import' tool that created compilable source files, this might not make sense to you, however if you change a model and then remake the game.. it would have no affect if the tool couldn't be automatically create the C or C++ source for the model from the data given. IE anytime you changed a model you would have to manually convert everything to work (done that got tired of that have the T Shirt). If you need said tools tell me how your 3d engine uses the information and I can make a CLI tool to convert 3ds models to the C source and then the normal C to object then linking etc. will work fine.

And now for Metal Gear Solid GBA ;)

Cyb

Cyberman wrote:

I think what would be idealic is to have an 'import' tool that created compilable source files, this might not make sense to you, however if you change a model and then remake the game.. it would have no affect if the tool couldn't be automatically create the C or C++ source for the model from the data given. IE anytime you changed a model you would have to manually convert everything to work (done that got tired of that have the T Shirt). If you need said tools tell me how your 3d engine uses the information and I can make a CLI tool to convert 3ds models to the C source and then the normal C to object then linking etc. will work fine.
Cyb

There are already some tools for model import. They uses Half-Life .smd file format (I chosed it because it was human-readable).
There's also a pogoshell plugin, that would allow one to test models without recompiling stuff.
However, if you want to write other export tools, it would be great :)
_________________
GBA,GC,NGPC,GP32,FPGA,DS stuff at http://torlus.com/

Torlus wrote:

There are already some tools for model import. They uses Half-Life .smd file format (I chosed it because it was human-readable).
There's also a pogoshell plugin, that would allow one to test models without recompiling stuff.
However, if you want to write other export tools, it would be great :)

LOL, I was really asking was what was the engines internal format for data, is it documented? It might take a bit to convert the output of GMAX into 'scenery' and objects for the engines, from the character animation in your demo (I remember it from long ago). You do not require just vertical surfaces which I think might allow for more complex areas. One can always use Medievil's trick of disappearing anything over a given distance from the view. I do suspect speed will slow down mightily if too many polys are used (grin).

GMAX has a Quake 3 level export and I think Half life (I think) export capability.. I just prefer using the biggest common denominator for import export. That being said I'll have to look into it. I wondered what the SMD files were for. Hmmm perhaps a bit of cleanup. I wonder if I need to install VC++ 7 again .. ugh (hate it) Oh well I'll look into deciphering what the format is that you use.

Cyb

Quote:

Do you do per pixel lightning? I wouldn't do per pix light on an gba.

The lighting is per-vertex and interpolated. The texture mapper is faster without lighting, but the bottle neck is the geometry calculations, so it doesn't make that much difference to the overall speed. I'm adding variable level of detail at the moment which should fix the speed issues.

Quote:

How do you draw your polys? Do you use bresenham implementation to draw them?

Polygon rendering is stock standard DDA texture mapping. A lot of the speed comes from having the clipping, projection, edge scanning and affine mapping in the one function. The inner loop is unrolled and draws 32 pixels in one go. A switch statement handles the smaller chunks of pixels.

Quote:

Do you calculate the reciprocal LUT for all 32 or 16 bit integers? Would be an pretty huge LUT though... How did you do that?

The reciprocal LUT isn't that big at all. Mine is only ~300 integers which is about 1.2kb's. The table is calculated using something like:
for (int i = 0; i < 300; i++) reciprocal[i] = (1 << 16) / max(i, 1) + 0.5;
You can then divide 200 / 10 using:
(200 * reciprocal[10]) >> 16

There are some rounding issues that you need to look out for, but the system works since all the DDA divides are well under 300. In fact I could get away with a table half that size.

Kudos Derek. I have been attempting a 3d engine in Mode 4 for some time now so I know how difficult it must have been. Using Bresenham's algorithm to draw the polys then blast them with a fill was creating jagged edges and ugly seams. I wonder if DDA would help. Did you attempt this in Mode 4 before going to 5?

Then there's the 3dsMax parser. Did you get the file format from the handbook of image processing algorithms? I haven't bought it yet. I know there's .bmp, .pcx, .jpeg, etc. but I'm not sure what else it contains.

Anyway, enough of my babbling. Good job.

Thanks Foz,

Yep, mode 4 is hard. As you would know, the GBA video only accepts 16bit writes, so to render a single pixel you need a read, modify, write system. Or, you can use a IRAM/ERAM backbuffer and copy the buffer using a DMA command. I found this system very slow going.

Since I was aiming for dynamic lighting I started using two "8bit to 16bit" dithering LUT's. The idea is simple. You have a LUT that converts a light and pen to a 16bit value that contains _2_ 8bit pixels. You have 2 luts that are interchanged for odd and even scan lines. The result is, you can create real time dithering without slowdown.

There are a number of ways to calculate the LUT's. What I did was I placed the specular, ambient and difuse lighting components each in its own LUT pixel. The difuse was repeated so therefore you have your 4 dithered pixels. The results were pretty good for 8bit lighting.

http://www.theteahouse.com.au/gba/dither/4.jpg
http://www.theteahouse.com.au/gba/dither/1.jpg
http://www.theteahouse.com.au/gba/dither/2.jpg

I even tried fog effects.

http://www.theteahouse.com.au/gba/dither/3.jpg

But, at the end of the day, 15bit lighting just looked better. Im still using 8bit textures, but the LUT now converts to 555 15bit colour, and im using a 120x80 display. I fixed up a small rounding error in the reciprocal table which greatly improved texturing quality. Yeay for small fixes.

Anyway, I'll be uploading soon and adding more information to the site.

Hope to see you there
Derek Evans

http://www.theteahouse.com.au/gba/

Keywords: GBA 3d, GameBoy Advance, Fixed Point Maths, DDA texture mapping, BSP engine, terrain engine, CUBE, C/C++

Quote:

Then there's the 3dsMax parser. Did you get the file format from the handbook of image processing algorithms? I haven't bought it yet. I know there's .bmp, .pcx, .jpeg, etc. but I'm not sure what else it contains.

Sorry, I never coded that. Torlus write the 3D mesh engine.

http://heliscar.com/greg/

Torlus suggested that we combine engines to make a complete engine. Great idea. Sorry I haven't replied to you yet Torlus. I'm still working on stablizing the engine.

I'll contact you soon regarding your cool asm rendering code.

Derek

What is DDA? Is DDA something like dividing width by height of an edge and gettin the direction to move along? That's how I'm currently doing it, I'm sure with an reciprocal table it would be 100 times faster then bresenham implementation, but I couldn't get texture mapped polygons to work. I don't get it how you interpolate the texture coordinates whilst moving along the polygon edges.

atm i only want texture mapped polygons, no lightning or such fancy stuff, but I'm too dumb to get that working :(

I noticed that your frustum culling sometimes culls out visible polys (I specially noticed that on the cubes or if you go very close to an wall (you'll see the backbuffer content of the last frame...))

Yep, DDA (Digtal Differential Analzer) is probably the same as bresenham (It depends on what book you read). I use DDA algorithms for all my rendering. ie Lines, Rectangles, 2D bitmaps and texture mapping. Some people call it affine mapping. Most perspective correction mappers are extended DDA mappers that correct its UV every 16 pixels by re-adjusting your destination UV points based on the true Z depth for a given pixel. (What a mouth full) I have a DDA perspective correct texture mapper here, but its not worth using on a GBA. I'll release the code soon. I wrote it for a DOS portal engine.

Calculating UV for edges is the same as calculating X for each edge. As you know, Y increases by 1 for each scanline. So, you calculate the xdelta using something like:

Code:

xx = (x2 - x1) / (y2 - y1); // Calculate X delta.

Were (x1, y1) is the start point and (x2, y2) is the end of the current polygon edge. You then just use:

Code:

x1 += xx;

for each scan line. To implement texture mapping, you just add some (u1, v1)-(u2, v2) coords that select a rectangle on the texture.
Calculate delta U and V using:

Code:

uu = (u2 - u1) / (y2 - y1); // Calculate U delta.

Code:

vv = (v2 - v1) / (y2 - y1); // Calculate V delta.

Again, increase U and V for each scanline.

Code:

u1 += uu; v1 += vv;

Fixed point maths is a little more complex, but I coded my first DDA texture mapper using floats. Its pretty simple to then swap to 24:8 fixed point coding.

RE: The frustum culling issues. Thats actually caused by VIS culling which uses ray casting. Its not perfect as you can see. The collision detection also makes it possible to penertrate cells. Thats being fixed.

The culling quality is very Playstation 1. You will notice a lot of the same effects as were evident in PS1 games. Its due to not using a z-buffer, which would slow down the engine.

Polygon cracks are fixed by rounding Y coords up to the next scan line. ie: If you have Y = 1.4, then you should start the polygon at scanline 2, not 1. There is also what is called subpixel correction which takes out the little specks in high-resolution polygons, but this isn't needed on a GBA. Also, subtexel correction improves texture alignment, but again, this isn't needed. My current texture mapper looks as good as PS1 texture mapping (IMHO), so thats good enough for me. Lets work from here.

Derek Evans

thx, that helped me a lot!

I've started to write an simple dda texture mapping algo in VB, that way it's easier for me to test/debug the code then on an gba.

But you just told me how to interpolate the tex coords on 1 edge, but how could i interpolate the coordinates from the left polygon edge to the right one and fill the pixels in between with my texture? I hope I expressed myself correctly (my english is not the best *cough* ^^)

O, thats easy. Lets say you have a convex polygon made from 5 vertices. We render from top to bottom, so you first need to find the highest and lowest vertex. Ie: The vertices with the lowest Y and the highest Y. This will give you your outer loop range from Y1 to Y2. Also it gives you the two starting vertices that are used to "trace" around the polygon. Outer code is something like:

Code:

for (y = y1; y < y2; y++)
{
// inner loop code
}

You track the 2 edge lengths which both start at zero. Since the edges are zero, we setup 2 new edges. One going from the left edge vertex index backwards to the (left edge index - 1) vertex. And one going from the righ edge vertex index to the (right edge index + 1). You need to make sure your vertex indices wrap arround your list of polygons vertices. So, right vertex 4 links to right vertex 0. (The polygon start)

So, you create your two edges by calculating the new edge lengths and the new delta values for all the vertex information you want interpolated down the edges. ie: In one engine I interpolated seven values for full RGB lit texture mapping. (X, Z, U, V, R, G B). My GBA code just interpolates (X, U, V and light).
You then simply "step" the _two_ edge variable groups for each scanline until you detect that one of the edges ends. ie: Edge length = 0. Again, you setup the new edge and continue until you get to the bottom of the polygon. Note: After the first 2 edges are setup, each of the left & right edges can end or start at different times.

Anyway, thats the idea behind convex polygon rendering. Some people setup the first two edges outside the Y scanning loop. ie: Michael Abrash style. I prefer to keep the edge generation code together. You can define a structure to store the edge information if you want. ie:

Code:

typedef struct {
int x, u, v, l;
} edge_t;

You basiclly need 3 edges for a convex polygon scan. The left, right and the horzontal scan edge. So, you would have:

Code:

edge_t left, right, scan;

The inner "Affine" loop simply copies the corrent left edge to the scan edge for each scan line and interpolates across the screen to the right edge. Simple.

Anyway, look at my code if you need a real example.

Derek Evans

Well, I think I got the basic concept of rendering textured polys and I finally worked out an simple implementation of an DDA poly rendering routine, but your code looked like you need way less calculations for rendering the textured poly... anyway, perhaps you could take an short look at my code and tell me if there's an way to optimize the whole thing. Atm it's only some cheesy vb code wich only uses floats, but as you said swaping to fixed point math aint that hard.

Code:

void polyc(vert2* p1, vert2* p2, vert2* p3) {
//edge interpolation
s32 xpmove, mxp;
s32 xcmove, mxc;
//texture interpolation (on edges)
s32 txpmove, txp;
s32 typmove, typ;
//texture interpolation (inside poly)
s32 itxmove, itx;
s32 itymove, ity;

s32 i, ip; //counter
vert2* swap; //swap var

if(p2->Y < p1->Y) {
swap=p2; p2=p1; p1=swap;
}
if(p3->Y < p1->Y) {
swap=p3; p3=p1; p1=swap;
}
if(p3->Y < p2->Y) {
swap=p3; p3=p2; p2=swap;
}

i = reciprocal(p3->Y - p1->Y);
ip = reciprocal(p2->Y - p1->Y);
//calculate delta for the edge wich goes from top to bottom
xpmove = (p3->X - p1->X) * i;
//calculate delta for the edge wich goes from top to middle
xcmove = (p2->X - p1->X) * ip;

//calculate texture interpolation on parent edge (top to bottom)
txpmove = (p3->tX - p1->tX) * i;
typmove = (p3->tY - p1->tY) * i;

//calculate the inner texture coordinate interpolation
//(the value txp/typ has to change each x-pixel inside poly)
//scale down 11 bits because xcmove-xpmove is damn large
//because of 16 bit fixed point format (PROBLEM! :()
i = reciprocal((xcmove - xpmove)>>11);
itxmove = ((((p2->tX - p1->tX) * ip) - txpmove)>>11) * i;
itymove = ((((p2->tY - p1->tY) * ip) - typmove)>>11) * i;

mxc=mxp= p1->X<<16;
txp = p1->tX<<16;
typ = p1->tY<<16;

if (xpmove<xcmove) {

for(i=p1->Y;i<(p2->Y);i++) {
mxp += xpmove;
mxc += xcmove;
txp += txpmove;
typ += typmove;
itx = txp;
ity = typ;

for(ip=(mxp>>16);ip<(mxc>>16);ip++) {
itx += itxmove;
ity += itymove;
   VideoBuffer[ip*160+i] = bitmap[itx>>16][ity>>16];
}
}

//re-calculate directions for right edge of poly
xcmove = (p3->X - p2->X) * reciprocal(p3->Y - p2->Y);

//Inner loop for lower poly part
for(i=p2->Y;i<(p3->Y-1);i++) {
mxp += xpmove;
mxc += xcmove;
txp += txpmove;
typ += typmove;
itx = txp;
ity = typ;

for(ip=(mxp>>16);ip<(mxc>>16);ip++) {
itx += itxmove;
ity += itymove;
   VideoBuffer[ip*160+i] = bitmap[itx>>16][ity>>16];
}
}

}else{

for(i=p1->Y;i<(p2->Y);i++) {
mxp += xpmove;
mxc += xcmove;
txp += txpmove;
typ += typmove;
itx = txp;
ity = typ;

for(ip=(mxc>>16);ip<(mxp>>16);ip++) {
itx += itxmove;
ity += itymove;
   VideoBuffer[ip*160+i] = bitmap[itx>>16][ity>>16];
}
}

//re-calculate directions for right edge of poly
xcmove = (p3->X - p2->X) * reciprocal(p3->Y - p2->Y);

//Inner loop for lower poly part
for(i=p2->Y;i<(p3->Y-1);i++) {
mxp += xpmove;
mxc += xcmove;
txp += txpmove;
typ += typmove;
itx = txp;
ity = typ;

for(ip=(mxc>>16);ip<(mxp>>16);ip++) {
itx += itxmove;
ity += itymove;
VideoBuffer[ip*160+i] = bitmap[itx>>16][ity>>16];
}
}
} //if (xpmove<xcmove)
} //polyc

this code requires the longest side of the poly to be at the left side, but it's just a basic thingie to get started though.

I also thought about an z-buffer-like aproach by defining an array of 16 bit values wich stores the distance of every drawn pixel to the cam, if another pixel gets drawn you'll be able to check if there's already an pixel wich's closer to the camera. This way you wouldnt have to sort the geometry, this would speed up things a lot I think. The only problem is that you'd have to check every pixel, I dunno if this would decrease the speed a lot...

Last edited by Lupin on Fri Aug 22, 2003 3:02 pm; edited 2 times in total

There are some really cool tricks you can use to speed things up a lot, most of which I learned from http://freespace.virgin.net/hugo.elias/
You actually only need to calculate
iTxMove = ((txC - txP) / (cxc - cxp))
and
iTyMove = ((tyC - tyP) / (cxc - cxp))
once at the start, they turn out to be the same every time. It works best if you do them at v2 since that's where the triangle's the widest, so you get the most accuracy, but anywhere is fine. So do like
iTxMove = (P2.tX - (P1.tX + txPMove * (P2.tX - P1.tX))) / (P2.x - (P1.x + mxp * (P2.x - P1.x)))
I hope I got all that right. Basically you want to calculate the x and tX of the point on the long side that's across from the middle vertex. Then calculate iTxMove the same way you already do ((tX - tX) / (x - x)).
And then you'll notice that you don't need to calculate the texture deltas each line, you don't need to do that txC - txP, and thusly there's no point in even having txC. So you just interpolate down the left side of the triangle, and then interpolate accross each line with your precalculated iTxMove. Do all that for the tex Y coords too.
I didn't believe that would work until I tried it, but it does. And when you're writing a tri-filler in ASM, that can save you a lot of registers, so it's quite useful.

Also, to prevent little cracks frmo popping up between your polies, you'll need to do a little correction to cxc and cxp. Since you can only draw on even integer Y coordinates (that would be one whole pixel (no half pixels^_^)), then your polygon starts slightly above or slightly below where it would really be if you had infinite resolution. So to correct it, you find the x start and x end positions at the 'real' center of the first Y line. It's kind of hard to describe, check the scan-converting page at that site (http://freespace.virgin.net/hugo.elias/graphics/x_polysc.htm) under perfect scan converting for a nice picture.
The correction amount is yErr = (float)(int)(y + 1.0) - y. That's the distance from the current Y to the center of the next pixel. So set cxp to P1.x + mxp * yErr, cxc to P1.x + mxc * yErr, and it doesn't make much difference, but to be fully correct set txP to p1.tX + txPMove * yErr, same for tyP.

If none of that made sense, read the stuff on that site. If it did make sense, read it anyway^^
And if it doesn't make sense after that, just try it and see what happens, and if all else fails, I'll try to give a better explanation^^;
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

your first optimization did work fine! I really dont know why I didn't saw that there's some place for such an important optimization...

but....

iTxMove = (p2.tX - (p1.tX + txPMove * (p2.tX - p1.tX))) / (p2.X - (p1.X + mxp * (p2.X - p1.X)))
iTyMove = (p2.tY - (p1.tY + tyPMove * (p2.tY - p1.tY))) / (p2.X - (p1.X + mxp * (p2.X - p1.X)))

...doesn't turn out to be the same as:

iTxMove = (((p2.tX - p1.tX) / (p2.Y - p1.Y)) - txPMove) / (mxc - mxp)
iTyMove = (((p2.tY - p1.tY) / (p2.Y - p1.Y)) - tyPMove) / (mxc - mxp)

But this optimization isnt that much neccesary, because I'm of course going to use an reciprocal lut wich would perhaps even work faster then your approach.

NOTE: I edited my old code and added the changes

Oh, I forgot about the polygon crack thingie. Do these cracks also appear with fixed point math? I don't think so (and I'm of course not going to use floats on gba ^^)

Yeah, I was using 16.16 fixed point and getting some pretty nasty cracks. You won't really notice it's a problem until you have something rotating around like a cube. If you're just rendering triangles, so a square is 2 tris next to eachother, cracks start popping up right across that line every now and then, which is very icky looking. But that adjustment is pretty easy to add, so you could just wait until later to do that, incase yours isn't as bad as mine was.

But yeah, I thought that bit of code seemed a little screwy. What I was trying to do was calculate the point on the long side across from the middle vertex by taking the slope of the long side times the y distance from the top to the second vertex. I'll try to draw a picture..

Code:

/|
/ |
/ | /_ calculate x right here
\ | \
\ |
\|

That has the long side on the right though cause that made it easier to draw. It's the same code either way.
So, I'll try again with code, using my own var names so I don't get confused^_^

Code:

vMid.x = v1.x + longSlope * (v2.y - v1.y);
vMid.tX = v1.tX + longTXSlope * (v2.y - v1.y);
vMid.tY = v1.tY + longTYSlope * (v2.y - v1.y);
//now this part is only for long side on the left
tXInc = (v2.tX - vMid.tX) / (v2.x - vMid.x);
tYInc = (v2.tY - vMid.tY) / (v2.x - vMid.x);
//...but now that I look at it, I think it will work either way

And yes, do use a reciprocal table. The way I do it is with 30-bit reciprocals, that way you get lots of accuracy. Use this little ASM function to multiply by that

Code:

.fpMulX
.arm
.align 4
fpMulX:
smull r3, r0, r1, r0
mov r3, r3, LSR r2
rsb r2, r2, #32
orr r0, r3, r0, LSL r2
bx lr

and in your C file

Code:

extern s32 fpMulX(s32 m1, s32 m2, u32 fp);

fp is the fixed point bits, so for the rcp table, give it a 30. You can do it without that rsb if you write it specifically for 30 bits, but it's handy to have around for variable-accuracy fixed point stuff.

Hope that clears stuff up a little
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Your code actually looks fine for PC stuff. Most floating point maths is equal to interger maths on current Intel/AMD processors. Infact, if I were aiming for perspective correct teature mapping on a PC, I'd keep the outer loop floating point. I've seen many DDA texture mappers using floats that feel just as fast as fixed point.

The biggest slow down with floats is converting to integers. There is a little trick in C/C++ which converts a float to a 16:16 fixed point but im not sure how to do it in VB. Look at the QMap or BOOM engine.

Using fixed point just for the inner loop is fast enough.The problem with the GBA is the low resolution. Most polygons in the distance have 32-64 span lengths. My entire GBA viewport is only 120x80! What that means is, percentage wise, your outer loop effects you overall speed just as much as your inner loop on a GBA which is why a spent a lot of time optimizing the entire texturing/clipping function. On a PC this is not the case. With 3 times or more the resolution, your spans only tripple in length, but the number of pixels increases by 9 times or more. Meaning the outer loop is less important for higher resolutions.

On a PC you can get away with having a float outer loop and a fixed point inner loop. It looks like you are using GDI pixel rendering in the inner loop. No matter how much you optimize your outer loop code, its never going to get any faster using GDI pixel plotting.

Im guessing you could increase your speed quite a bit simply by using a backbuffer DIB and blit the entire DIB to the DC. I'd swap to Delphi. Visual Basic creates awful code anyway, and Kylix is just brilliant.

If you are planning on coding for the GBA but want to develop under under windows. get Watcom C/C++. Its free off the net. I developed all my GBA code using Watcom before porting. I still have a version of my GBA engine running under DOS.

Derek Evans

thx for all your advice, I'm going to port everything to gba now.

I don't think that 30 bit precission is that important for an number range of 1/(1 to 200) or so, I'll stick to 16 bit precission because that way I could use an normal mul.

I still have another simple question: Is an mul as fast as an add?

because if it's the same speed I could just do:

For i = 1 To (p2.Y - p1.Y)
cxp = cxp + mxp
cxc = cxc + mxc

itX = txPMove * i
itY = tyPMove * i

For ip = cxp To cxc
itX = itX + iTxMove
itY = itY + iTyMove

Me.ForeColor = BitMap(itX, itY)
Me.PSet (ip, i)
Next ip
Next i

I think that'd work better

Mmm, hard to say. Some multiplies are faster as add's. You can even do a combination of bit shifts and add/subtraction's to multiply by say 240. (Look on the GBA forum for this information). eg:

Code:

A * 5 = (A << 2) + A;
A * 6 = (A << 2) + (A << 1);

Hope thats correct. Its late here and im tired, but you get the idea.

Yep, looping might be faster for small additions but there will be a point where looping will be slower. You could look into a multiply table or switch statment but I dont think it would work since the multiplies are fixed point. ie: They are not small like the divides.

The GBA ARM processor has a multiply instruction and Im pretty sure the ARM designers would have got it right. Processor design has come a long way since 6502 and Z80 chips.

The inner loop is what is going to slow you down and you only need multiplies at each vertex, not each scanline.

So, optimize in this order:
1) Inner loop
2) Outer edge/scanline stepping
3) Vertex/edge setup code
4) Projection
5) Frustum clipping
6) Rotation
7) VIS culling
8) Polygon generation

Yep, thats the rendering pipeline in reverse. So, start at the inner and work outwards.

Derek

Yeah, that's especially good when writing fast ASM routines. Start by writing the inner loop, figuring out what you need to have in registers, and design the rest of the function around that.
As for multiply speed, yes, adding is faster. Mul takes 2-5 cycles. If the upper 3 bytes of the second factor (like in mul r0, r1, r2 it woudl be the bits of r2 that determines the speed) are all 0 or all 1, then it takes 2 cycles. If the upper 2 bytes are all 0/1, then 3 cycles. Upper 1 byte all 0/1 is 4 cycles, and if the upper byte is something other than all 0/1, then 5 cycles. mull takes one more cycle than mul, so yes, a regular mul would be slightly faster, especially without the function call overhead. Unless you plan to write the final version in ASM, and then you can use smull, and if you're adding/subtracting the thing you multiplied to something else, you can do like
smull r0, r1, r2, r3
add r4, r4, r0, LSR #30
add r4, r4, r1, LSL #30
so it's pretty fast. You could probably get away with throwing away the 2 bits that come from r0 too, since that would only be 3/65536 off at most using 16-bit fp^_^

But anyway, adding is faster than multiplying, but since i should always be less than 256, then multiplying will only take 2 cycles, so it might actually be faster than loading in your temporary itX/itY and adding, since ldr takes 3 cycles.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

I replaced my VB code with some cpp code wich does the same, but I noticed that my calculation of the inner texture deltas isn't precise enough to allow good looking texture mapping, I always get cracks within the flow of my texture :(

The shape of the polygon looks ok to me, this is the code wich causes the problems:

//calculate the inner texture coordinate interpolation
//(the value txp/typ has to change each x-pixel inside poly)
//scale down 11 bits because xcmove-xpmove is damn large
//because of 16 bit fixed point format (PROBLEM! :()
i = reciprocal((xcmove - xpmove)>>11);
itxmove = ((((p2->tX - p1->tX) * ip) - txpmove)>>11) * i;
itymove = ((((p2->tY - p1->tY) * ip) - typmove)>>11) * i;

even if I scale down by 11 bits it has index out of bounds problems as soon as the poly lies at an specific direction (I think it alway occours when the poly has an low slope or sth like that ^^)

How big are your textures? Using 24:8 fixed point maths with a reciprocal table does mean you can only have small textures, since you either overflow the 32bit limit or the reciprocal returns a delta value that over runs the texture border.

ie: If your delta calculations are out then for very large polygons or textures you will find your UV calculations gradually become incorrect. Kinda like aiming for the moon and being 1 degree out. You could fix this by recalculating the delta value after each 16 pixels, but that would slow it down.

The best way to fix is to use 16:16 for the texture mapper and increase the reciprocal tables accuracy. But, that causes more overflows which require some ASM to maintain the upper 32bits of a (64bit multiply 32bit) operation. The extra function calls will slow down scanline steping so this is really only a solution for ARM asm coders which Im not.

Basiclly what Im saying is, this style of mapper can only really handle small polygons and small textures. Im using 64x64 textures and lots of small polygons which gives me nice lighting.

Try wrapping your UV's using a AND operation which is a lot faster than a MOD, hence the reason graphics cards only render texture sizes of powers of 2. ie: 2, 4, 8, 16, 32, 64, 128, 256, ....

Derek Evans

hm, my texture is only 8x8 pixels though... I'm sure there's something wrong with the calculation of the inner texture coordinates deltas.

16:16 seems to work fine, even much better then 24:8 and even without asm code because the numbers arent that big.

Oops, on second look try:

itxmove = ((((p2->tX - p1->tX) * ip) - txpmove) * i) >> 11;

You need to shift the bits after you multiply by the fixed point reciprocal. That probably explains why you never got overflow since you were loosing 11 bits to early.

A 16:16 reciprocal will eat 16 bits. Leaving 16 bits for the texture UV's

Derek

[censored ;)]

I found out how to do it, now it works perfectly! Thank you so much!

Last edited by Lupin on Sat Aug 23, 2003 8:24 pm; edited 1 time in total

Interesting code Lupin. You are using the same UV deltas for the entire polygon which seems to be working. The question is, will that work for all polygons in 3D?

your polygon left edge is vertical, meaning your span/texture width percentage is the same so you can use the same deltas. But, this isn't the case when you start moving the polygon a little more.

ie: You are just skewing the texture which will not work (I think) in 3D. Example. In 3D textures get "pinched" to the 3D distance, but skewing doesn't alow pinching, unless you use two triangles, but I doubt the texture would line up at the edge.

Anyway, I could be wrong. Send me a copy if you get it working. It might be a solution for small polygons, but I can imagine it might look awful for world rendering.

Derek

Yes, that style does indeed look horrible for worlds, but not too bad on characters, and it is a lot faster and easier to learn. I have yet to write more than a basic (and super-slow) C perspective tri-filler, so I can't really say for sure, but I think you could still use that constant delta trick with u/z, v/z and 1/z, since you interpolate those linearly, the only time it goes non-linear is when you actually divide by 1/z.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

I set up an better example of the function here:
http://mitglied.lycos.de/lupin003/link.htm

You'll notice odd texture distortions in 2 cases:
a) If the poly gets very thin the texture won't be scaled correctly (doesn't matter that much though)
b) As soon as the long side of the poly changes the texture will get totaly screwed up and I have no idea why... :(

I'm sure these distortions are just because the odd 160x120 resolution.

I tried using your 120x80 mode5, could you please explain how I could set an pixel in that mode Derek? The data seems to be stored very differently from the other modes.

Derek,

I think you should keep a rotated mode5 stuff (a 120*160 resolution). With lower resolutions, rendering won't be good enough. There are so many things that could be improved using assembly. If you don't mind, I'll write a poly-drawer in arm asm to replace your current one.

But at the moment, if you want speed, maybe you should "forget" all lightning/shading stuff instead, as I don't think that on a standard GBA screen, with its dark-and-poor color rendering, it's something of much interest.

I tried to contact you, but I haven't be able to find your e-mail on your (nice) website... Feel free to contact me if you'd like us to work together on your engine.
_________________
GBA,GC,NGPC,GP32,FPGA,DS stuff at http://torlus.com/

128x80 feels fine. Its just 240x160 halved. The pixels are exactly twice as big. RE: Lighting vs non-lighting. The GBA screen looks fine as can be seen from this photo:

http://www.theteahouse.com.au/gba/real.html

The speed issues are my main concern and they have very little todo with lighting. Even when the lighting is removed, the engine doesn't speed up all that much.

I've just added distance quantization. ie: I round distance cell heights to the nearest cell. Basiclly this reduces detail. The frame rate has now stablized. I tried rendering the distance polygons using a standard fill but the engine wasn't any faster.

So, basiclly, I now have a stable fixed frame rate engine which was my goal. As for hardware. Personally, I'm not all that interested in GBA stuff. If I wanted a non-lit 2.5D engine I would have coded a ray-caster or Build style engine.

But, there is some very very nice hardware comming out soon for mobile phones. This is the path Im preparing myself for. Even if I had a GBA engine running full speed today, it would be another 6 months for a game to be made.

Personally, I think the GBA is going to loose market share to mobile phones. But, we will see.

Derek Evans

http://www.theteahouse.com.au/gba/index.html

yeah, the n-gage looks promising :)

I really like your engine it's well written code, but maybe you should really write some time-critical code in hand-written asm (even on mobiles this would give you an speed increase). If you've enough time you should perhaps think of doing the main stuff in 100% asm (arm asm really rocks ;))

gbadev.org forum archive

Graphics > Mode 5 (120x160) 3D Engine Released

#9590 - Derek - Sat Aug 09, 2003 3:53 pm

#9605 - Torlus - Sun Aug 10, 2003 10:08 am

#9612 - jenswa - Sun Aug 10, 2003 9:09 pm

#9618 - Derek - Mon Aug 11, 2003 12:32 am

#9622 - Lupin - Mon Aug 11, 2003 9:00 am

#9640 - Cyberman - Mon Aug 11, 2003 10:00 pm

#9650 - Torlus - Tue Aug 12, 2003 8:49 am

#9658 - Cyberman - Tue Aug 12, 2003 6:37 pm

#9669 - Derek - Wed Aug 13, 2003 5:49 am

#9763 - Foz - Sat Aug 16, 2003 5:35 pm

#9786 - Derek - Mon Aug 18, 2003 5:06 am

#9791 - Derek - Mon Aug 18, 2003 7:47 am

#9813 - Lupin - Mon Aug 18, 2003 8:36 pm

#9823 - Derek - Tue Aug 19, 2003 3:01 am

#9829 - Lupin - Tue Aug 19, 2003 12:11 pm

#9852 - Derek - Wed Aug 20, 2003 1:27 am

#9860 - Lupin - Wed Aug 20, 2003 1:30 pm

#9868 - DekuTree64 - Wed Aug 20, 2003 5:18 pm

#9870 - Lupin - Wed Aug 20, 2003 6:38 pm

#9871 - Lupin - Wed Aug 20, 2003 6:51 pm

#9873 - DekuTree64 - Wed Aug 20, 2003 7:59 pm

#9880 - Derek - Thu Aug 21, 2003 1:23 am

#9901 - Lupin - Thu Aug 21, 2003 1:18 pm

#9905 - Derek - Thu Aug 21, 2003 2:31 pm

#9907 - DekuTree64 - Thu Aug 21, 2003 3:58 pm

#9945 - Lupin - Fri Aug 22, 2003 3:07 pm

#9955 - Derek - Sat Aug 23, 2003 12:29 am

#9956 - Lupin - Sat Aug 23, 2003 11:16 am

#9958 - Derek - Sat Aug 23, 2003 11:51 am

#9959 - Lupin - Sat Aug 23, 2003 1:16 pm

#9988 - Derek - Sun Aug 24, 2003 6:13 am

#9989 - DekuTree64 - Sun Aug 24, 2003 6:30 am

#9991 - Lupin - Sun Aug 24, 2003 9:35 am

#10052 - Lupin - Mon Aug 25, 2003 8:59 pm

#10054 - Torlus - Mon Aug 25, 2003 9:44 pm

#10313 - Derek - Tue Sep 02, 2003 9:21 am

#10322 - Lupin - Tue Sep 02, 2003 1:30 pm