#48078 - jormundgard - Sun Jul 17, 2005 12:37 am
This might be better asked in Graphics, but I'll give it a try here.
I'm doing some perspective effects with sprites, and to do it right seems to involve lots of divisions (assuming that I'm doing it right, at least). For the background, I've been using a lookup table, but because of all the rotations and such, it didn't seem practical to use one for the sprites. What do people here recommend? Should I just use regular slash (/) division or maybe the BIOS routine? Or is it more reasonable than I realize to use a lookup table?
#48083 - Lord Graga - Sun Jul 17, 2005 1:15 am
Look up tables are, as far as I know, the fastest possible way to divide. BIOS division is not that fast, and using a regular slash for division (as in x = y / z) is horribly slow and very unrecommended.
So stick with your tables.
#48115 - poslundc - Sun Jul 17, 2005 6:27 am
I wouldn't shy away from using GCC division (the / operator) for testing/development purposes, considering you should only need to perform one or two divisions per sprite per frame. It might even be fast enough for your final code.
That said, if it's running too slow then you can easily boost performance by switching to the BIOS divide or your own custom routine. As Graga says, LUTs are about the fastest you can get, although I seem to recall from my own sprite experiments that they don't lend themselves too well to LUT division (can't remember the precise reason).
Dan.
#48124 - Mighty Max - Sun Jul 17, 2005 10:32 am
The best way to improve the DIV speed is to reduce the DIVs.
From ARM-Limited:
Quote: |
If a lot of the divisions by the same number are wanted, the performance can usually be improved by using one division to calc the number of reciprocal, followed by a lot of multiplications by that reciprocal. This slightly reduces the accuracy of the calculations, since thei incure two rounding errors rather then one, but this is often an acceptable tradeoff. |
And use shifting as often as possibe for 2^n dividers.
#48126 - Quirky - Sun Jul 17, 2005 11:05 am
Mighty Max wrote: |
And use shifting as often as possibe for 2^n dividers. |
That isn't needed if you have hard coded divides (i.e. a = b / 16;) they will be changed to shifts by the compiler. Using shifts where divides are intended just tends to obfuscate the code.
#48129 - Lord Graga - Sun Jul 17, 2005 12:24 pm
A good thing to remember when multiplying by numbers not 2^n in ASM is that you can easely combine multiplications by 2^n and substractions/additions by 2^n to get the desired result.
#48177 - Miked0801 - Sun Jul 17, 2005 11:55 pm
But not for divides.
For perspective divides in the past, we used a very large (200K or so) LUT to increase it's speed (it was actually a reciprical table so we could multiply.) For texturing, you need a divide per scanline per poly. For your algorithm, it's probably closer to 2 divides per sprite per tic which means an ARM divide routine or even the BIOS one would suffice. BTW, you can overload the internal / routine with your own. You just need to name it correctly.
#49153 - jormundgard - Tue Jul 26, 2005 10:58 pm
Somehow I had turned off email announcements so I missed this entire conversation, but I wanted to thank everyone for their advice.
I don't think that LUTs will help out too much here, because there's just too large a field of possible values. But I'm playing with the idea of two separate 1/x LUTs that can be multiplied together. And most of the divisions are for variables (mostly the distance between the viewer and the sprite) so bitshifts are out as far, as I know. I've been avoiding learning how to use SWIs so I'll probably give that a try first. I also like the idea of overloading the / routine.
The performance is actually OK with the GCC divide, even without optimizations. What I'm worried about is when things like the sound are added on later.
#49494 - dovoto - Sat Jul 30, 2005 9:56 am
devkitarm overides the '/' with the bios divide (last i heard). So using the swi will not gain you mich. There are faster divide rutiens out there but if speed is that critical (and it is only that critical if you cant get your desired frame rate by looking at the algorythm itself) then the LUT is really the only option.
_________________
www.drunkencoders.com
#49500 - jormundgard - Sat Jul 30, 2005 2:44 pm
I'm not using devkitarm (I'm using a commerical development set) but I haven't noticed much speedup with the bios routines, so I suspect that the compilers I'm using also did this override.
Part of the problem with using LUTs is that I need to know the relative x and y distances between the viewer and the sprite, and that inevitably needs sin(x) and cos(x). Maybe I can calculate these distances in the untransformed x-y plane (whose divisions can probably be as a LUT, at least approximately) then rotate them to x' and y' and proceed from there.
#49525 - Miked0801 - Sat Jul 30, 2005 9:41 pm
How so? Cartesian distance squared will give you a quick, accurate distance between to points cheaply.
#49526 - tepples - Sat Jul 30, 2005 9:52 pm
"X and Y distances" probably refers to X and Y components of a displacement vector. In context, this probably refers to a 3D or pseudo-3D game such as Mario Kart, where the camera code has to translate and rotate (this is where the sin/cos come in) from the world coordinate space to the local coordinate space, and then divide by z.
Luckily, sin and cos are the most obvious applications of lookup tables.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#49556 - jormundgard - Sun Jul 31, 2005 4:53 am
Right, I need x and y (or their rotated counterparts) explicitly, since I need to do calculations like z' = z (x' / x). The details aren't coming to me right now, but it has something to do with x, the location of the projection plane, being the controlling parameter.
#49684 - Miked0801 - Mon Aug 01, 2005 6:38 pm
So you are doing a "Is my actor on screen" check?
#49688 - jormundgard - Mon Aug 01, 2005 6:50 pm
Yeah, but the motion is roughly in one direction and you can't go backwards, so a check in that direction (say, boat.y - obstacle.y) is basically good enough. It's the texture (x,y) -> screen (y,z) that's my main priority. I'll probably have to think about more generic distance checks at some point though.