gbadev.org forum archive

Hi guys, just getting back into development of my game. I'm running into some precision issues with 32-bit arithmetic (since my levels are huge) and was considering changing aspects of my arithmetic to 64-bit with long longs, and writing my own 64-bit arithmetic functions. Before I waste any more time, I want to know if it will be worth doing this performance wise, since we're dealing with a 32-bit processor?

The question is : Is it really needed? What sort of hughe level so you display? I can think of better ways to deal with that issue, but then, it depends on the game you're making.

Please, give us some more info.
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

Are you sure you're not just running into scale issues? Can you tell us more about the smallest and largest feature size of your map format?

As an example, you may very well have a 16_16 fixed point vertex format defining your world, but nothing really needs to be smaller then say, 16_4, making 28_4 a better format?

Tell us more!

On the actual 64bit question.

Yes the arm9 supports the double word (64 bit) 5E instructions. Loading, multiply long, multiply & accumulate and such.
The hardwar dividing system (ports 0x04000290+) also support 64 bit.

But yes, building the C operators you need from these they are still slower then their 32bit pendand. But not as slow as pure 32 bit instructions for 64 operations.
_________________
Trying to bring more detail into understanding the wireless hardware

does devkit arm have native support for 32 bit by 32 bit multiples to a 64 bit result? is some assembly code needed to make use of this? I currently just cast both operands to long long though I suspect that just ends up being done in software.

Well currently I'm translating and scaling all of my geometry on load to fit into the v16 (4_12) type, and when collision testing against collidable models I multiply by scale and add the translation variables. It works pretty fast, but I'm going to be using display lists soon so I can get rid of that v16 limit (although I'll probably keep this as storing vertices as v16 saves space).

Anyways, so when I do collisions I multiply by scale and add the translation to get to the original vertex and store it as an int in f32 format, so atm the maximum I can have is 2,147,483,648/4096 = 524,287 in non-fixed-point format. My actual vertices aren't anywhere near this order of magnitude, but for large levels the calculations can get close or exceed this value. It's a fast-paced racing game so even though the player is small they cover a lot of ground, which is why levels are fairly big!

Do you use any scene partitioning / management?
Because, you only have to check for collisions with near by colliders and scenery. That should be pretty "simple" to do in a racing game.
This would effectively cap some calculations, because you would only have to check adjacent partitions, which wouldn't be too big, reducing the needed precision for those calculations...
I hope that's clear enough. :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

Rajveer wrote:

Well currently I'm translating and scaling all of my geometry on load to fit into the v16 (4_12) type, and when collision testing against collidable models I multiply by scale and add the translation variables. It works pretty fast, but I'm going to be using display lists soon so I can get rid of that v16 limit (although I'll probably keep this as storing vertices as v16 saves space).

Anyways, so when I do collisions I multiply by scale and add the translation to get to the original vertex and store it as an int in f32 format, so atm the maximum I can have is 2,147,483,648/4096 = 524,287 in non-fixed-point format. My actual vertices aren't anywhere near this order of magnitude, but for large levels the calculations can get close or exceed this value. It's a fast-paced racing game so even though the player is small they cover a lot of ground, which is why levels are fairly big!

Wait ....there is somethink wrong.

Why want't so big numbers

Take a unit of 1m/1feet and with the range of -1000....1000 you have a BIG BIG level

silent_code wrote:

Do you use any scene partitioning / management?
Because, you only have to check for collisions with near by colliders and scenery. That should be pretty "simple" to do in a racing game.
This would effectively cap some calculations, because you would only have to check adjacent partitions, which wouldn't be too big, reducing the needed precision for those calculations...
I hope that's clear enough. :^)

For racing games you have a sector approach...so your raceing map is just a straight 2D map.

Check only those cars against other cars which are in the same sector or the next sector (overlapping problem)

Maxxie wrote:

The hardwar dividing system (ports 0x04000290+) also support 64 bit.

In regards to the hw div, I wrote something that significantly speeds up my entire game (ymmv). At least half of all my div stuff is now doing half the clock cycles.
I replaced all my calls from libnds' divf32 to:

Code:

static inline
int32
v_divf32 (int32 num, int32 den)
{
switch (num > 0x7FFFF || num < -0x80000)
{
   case 0: DIV_CR = DIV_32_32;
            DIV_NUMERATOR32 = num << 12; break;

   case 1: DIV_CR = DIV_64_32;
            DIV_NUMERATOR64 = ((int64)num) << 12;
}

DIV_DENOMINATOR32 = den;
while (DIV_CR & DIV_BUSY) ;

return DIV_RESULT32;
}

Also made new normalizef32 calls to use this too.

Take in mind that this is not interrupt safe. If any of your irqs uses this div too you'll break it when it interrupts between param writes and result reads.

I'd add something to save and restore REG_IME or set the i-flag in PSR (both: as short as possible) to prevent this.
_________________
Trying to bring more detail into understanding the wireless hardware

Good point, thanks for the info.

Make your 'num' variable unsigned and then you only need to check against the high range (num > 0x7FFFF). Also, why use switch? Why not just if() as there's only every 2 cases...

Miked0801 wrote:

Make your 'num' variable unsigned and then you only need to check against the high range (num > 0x7FFFF).

What about negative numerators? Maybe I don't see what you're suggesting.

Miked0801 wrote:

Also, why use switch? Why not just if() as there's only every 2 cases...

My testing showed me that the switch statement was faster (i.e. the compiler and my settings produced better asm?)

You can keep using negative numbers -- just treat your signed number as unsigned for the purposes of the comparison. Negative numbers, when treated as unsigned, are just large positive numbers, so you only need a single unsigned comparison. Edit: Still, I don't see how your particular check could be condensed into a single comparison...

An 'if' would definitely be faster than a 'switch', as long as you only check for nonzero. That is, don't check for the exact value 1 -- just do the comparison and treat it as a true or false proposition, not a numeric value.

If you somehow end up with better asm with the switch, maybe you can post a comparison of the asm output between the two versions -- it would definitely be an anomaly in the optimizer, worth fixing in GCC. Just be sure to use -O2 or higher.

sajiimori wrote:

An 'if' would definitely be faster than a 'switch', as long as you only check for nonzero. That is, don't check for the exact value 1 -- just do the comparison and treat it as a true or false proposition, not a numeric value.

That is how I initially had it: if(num>0x7FFFF||num<-0x80000){}else{}

Then I had read that switch statements can sometimes be faster than an if/else due to the way it organizes the jump tables or somecrap. I don't know a whole lot about pure asm, but I thought I'd give a whirl and it ended up getting me a bit faster. In the end it's a pretty trivial difference.

sajiimori wrote:

use -O2 or higher.

I definately do :)

Anyway, for my particular code & data, I got some decent speed-up simply by not using DIV_64_32 when I didn't need to. That's all really.

Thanks, I appreciate the tips and info!

Hi guys! Yep I do use spatial partitioning, I'm using an octree atm. The problem is when a triangle is so far from the origin that the vertices are fairly large integers and therefore the calculations also get pretty large. I'm not sure if I should be translating triangles and players to the origin before doing the calculations, perhaps this would help. But by looking at the code, mulf32 calculates a long long anyways so returning a long long instead of an int shouldn't cause much slowdown right? These long values will mainly be used for comparisons against each other. Also ritz's division function looks pretty interesting in that it uses 64-bit division only when needed, so again that may also be beneficial (cheers ritz!)

A128: Currently my ships are around 1.5 units and the test track is over 1000*1000. Reason being that in the world fast futuristic-racing games, a 1000*1000 track isn't large at all ;) I won't be going REALLY big, but I don't want to limit myself, as later on I may want to create some large single-lap courses.

Which calculations are getting too large? If it's for collision tests, yes, treat one of the objects as the origin, and work in terms of a vector from one to the other.

If you're doing swept collision tests, also subtract off the motion of one of the objects (adding it to the other object instead), so one object is stationary at the origin. This is a common way to do collision tests.

I'm doing static tri vs aabb and static tri vs swept obb tests. I was thinking that I should be translating the box to the origin, but one problem is that for static triangles (most of the whole level), I precalculate the edge planes so I can avoid a lot of crosses per frame. If I translate then these edge planes become useless. I guess if I translate the triangle to the origin instead of the box then I can precalculate the edge planes while the triangle is at the origin, then they become useful again. Should I just translate any of the tri's vertices to the origin?

Rajveer wrote:

A128: Currently my ships are around 1.5 units and the test track is over 1000*1000. Reason being that in the world fast futuristic-racing games, a 1000*1000 track isn't large at all ;) I won't be going REALLY big, but I don't want to limit myself, as later on I may want to create some large single-lap courses.

I still don't get it. Sorry.

OK, your ship is 1.5unit and your track 1000*1000 unit

So why not use this scale for the ship

ship 1.5 unit /10

so you ship has 0.15 unit then your track is 10x bigger

It should be ok to store edge planes relative to the triangle's first vertex, and move that vertex to the origin during tests.

If you put the box at the origin, you could adjust the edge plane offsets based on the box's center in the world (the normals wouldn't change), but I suspect this wouldn't be any easier.

I guess you're right a123, I remember a while ago I tried scaling down and the accuracy went down in some areas, but that was with my old crappy engine where I had A LOT of problems. I'll try scaling down and moving my collision tests to the origin, this way I won't have to mess around with any 64-bit calculations.

I'll let you know how I get on, cheers guys!

Quote:

Well currently I'm translating and scaling all of my geometry on load to fit into the v16 (4_12) type, and when collision testing against collidable models I multiply by scale and add the translation variables. It works pretty fast, but I'm going to be using display lists soon so I can get rid of that v16 limit (although I'll probably keep this as storing vertices as v16 saves space).

I think your approach is fundamentally wrong. The native DS format is 1.3.12 fixed point. If you are using display lists(which you should) then that is the format that they will be in. There is no option for more numerical accuracy than that for displaying objects with display lists. That will give you values +7.9 to -7.9. You can of course use gltranslate to stitch sections together but that will only get you so far.

Where I think you are going wrong is that you are scaling up your geometry and calculating with big values. This is what is getting you into territory where you need 64 bit numbers which slows things down. You need to think small.

What I would do(what I am doing actually) is I store all 3D objects as display lists in 1.3.12. I then store them again seperately as an array of polygons for calculations, but this I store in 1.7.24 fixed point. This gives a range +127.9 to -127.9. Because the display lists range between +-8 multiplying two points gives a maximum +-64 which means 32 bit multiplications fit with room to spare. And because I use 24 fractional bits that gives over 16 million degrees of accuracy compared to 4096 in the lists so divisions also work very nicely, and it is enough precision for any calculations neccesary for things like physics and collision.

The only other thing to note is that the DS hardware uses 1.19.12 for things like it's matices. So if you are going to plug them into something like glTranslate you'll need to convert them to that format first(I dunno if there's anyway to change how the engine handles things like MATRIX_LOAD4x3 and alter the format that works at?).

Here's some code snippets that I use:

Code:

#define intf24(n) ((n) << 24)
#define f24int(n) ((n) >> 24)
#define f32f24(n) ((n) << 12)
#define f24f32(n) ((n) >> 12)
#define f24v16(n) (n >> 12)
#define f24t16(n) ((t16)(n >> 20))
#define f24v10(n) ((v10)(n >> 15))*/

#define NUMBEROFSHIFTS 24
// Fixed Point versions

// Fixed point divide
// Takes 8.24 numerator and denominator and returns 8.24 result
static inline int32 divf24(int32 num, int32 den)
{
DIV_CR = DIV_64_32;

while(DIV_CR & DIV_BUSY);

DIV_NUMERATOR64 = ((int64)num) << NUMBEROFSHIFTS;
DIV_DENOMINATOR32 = den;

while(DIV_CR & DIV_BUSY);

return (DIV_RESULT32);
}

// Fixed point multiply
// Takes 8.24 values and returns 8.24 result
static inline int32 mulf24(int32 a, int32 b)
{
long long result = (long long)a*(long long)b;
return (int32)(result >> NUMBEROFSHIFTS);
}

// Fixed point square root
// Takes 8.24 fixed point value and
// returns the fixed point result
static inline int32 sqrtf24(int32 a)
{
SQRT_CR = SQRT_64;

while(SQRT_CR & SQRT_BUSY);

SQRT_PARAM64 = ((int64)a) << NUMBEROFSHIFTS;

while(SQRT_CR & SQRT_BUSY);

return SQRT_RESULT32;
}

// Trig Functions 1.7.24 fixed point

// Cross product
// x = Ay * Bz - By * Az
// y = Az * Bx - Bz * Ax
// z = Ax * By - Bx * Ay
static inline void crossf24(int32 *a, int32 *b, int32 *result)
{
result[0] = mulf24(a[1], b[2]) - mulf24(b[1], a[2]);
result[1] = mulf24(a[2], b[0]) - mulf24(b[2], a[0]);
result[2] = mulf24(a[0], b[1]) - mulf24(b[0], a[1]);
}

// Dot Product
// result = Ax * Bx + Ay * By + Az * Bz
static inline int32 dotf24(int32 *a, int32 *b)
{
return mulf24(a[0], b[0]) + mulf24(a[1], b[1]) + mulf24(a[2], b[2]);
}

// Normalize
// Ax = Ax / mag
// Ay = Ay / mag
// Az = Az / mag
static inline void normalizef24(int32* a)
{
// magnitude = sqrt ( Ax^2 + Ay^2 + Az^2 )
int32 magnitude = sqrtf24( mulf24(a[0], a[0]) + mulf24(a[1], a[1]) + mulf24(a[2], a[2]) );

a[0] = divf24(a[0], magnitude);
a[1] = divf24(a[1], magnitude);
a[2] = divf24(a[2], magnitude);
}

Hey zeruda, you can probably cut down on some pointer aliasing in crossf24 by calculating the results as local variables, then storing them to the destination at the end. Check the compiler output.

(Incidentally, that would also allow passing one of the source pointers as the destination.)

Unless the caller is in ITCM, you might also want to uninline the larger functions (or at least have an uninlined version available) and put them in ITCM, to avoid spending a lot of time fetching instructions.

for me the native NDS format is 1.19.12

and I use this format for all my fixpoint calculations that could be float

if I think thinks like <0 failed -which should not--I test against < epsilon( 0.001)

BTW if you use boxtests then the verticies resolution should be +7.98/2 to -7.9/2

Scaling geometry down by a scale of 10 works, so does translating collisions to the origin, so I may stick with this method for the project as development has gone pretty far!

Zeruda, when you store values in 1.7.24, these are the same values as in 1.3.12 format but using a 32-bit integer for accuracy during collisions right? This method intrigues me, the only problem is that vertex information will be twice as large memory wise as currently I store them in 1.3.12. I guess what I could do is use display lists (which are in 1.3.12 format as you said) and for collisions also keep vertices stored in 1.3.12 but use them as a 1.7.24 during tests. Then I lose accuracy when actually converting my model files into this format compared to storing them as 1.7.24, but at least this inaccuracy is the same as with the inaccuracy in the display lists.

Quote:

Then I lose accuracy when actually converting my model files into this format compared to storing them as 1.7.24, but at least this inaccuracy is the same as with the inaccuracy in the display lists.

You are not "losing" anything. You're models are in 16 bit anyway. All that is happening is that you are gaining 32 bit accuracy for your calculations.

Quote:

This method intrigues me, the only problem is that vertex information will be twice as large memory wise as currently I store them in 1.3.12.

Yes there is a cost for storing vertices twice but that happens anyway as soon as you start using display lists(which is the way to go 3D). And actually I store the list of vertices in 32 bit for collisions,(which triples the cost). But if you store them as 16 bit, then there will be a cpu cost in converting them to 32 bit.

So the question is, are you really running out of space on the 4MB of WRAM?

sajiimori wrote:

Hey zeruda, you can probably cut down on some pointer aliasing in crossf24 by calculating the results as local variables, then storing them to the destination at the end. Check the compiler output.

(Incidentally, that would also allow passing one of the source pointers as the destination.)

Unless the caller is in ITCM, you might also want to uninline the larger functions (or at least have an uninlined version available) and put them in ITCM, to avoid spending a lot of time fetching instructions.

Thanks for the tips. I don't actually use those last 3 functions currently. All that code I posted is just modified versions of the functions in Math.h in libnds. Maybe these functions should be updated in the next rev of libnds...

zeruda wrote:

Code:

Then I lose accuracy when actually converting my model files into this format compared to storing them as 1.7.24, but at least this inaccuracy is the same as with the inaccuracy in the display lists.

You are not "losing" anything. You're models are in 16 bit anyway. All that is happening is that you are gaining 32 bit accuracy for your calculations.

If you apply operations that are have a bad numerical condition (as some matrix operations do) you are losing accuracy if you exchange the model data from 32 to 16 bit, even if the final stage only accepts 16 bit precission, you are losing accuracy.
_________________
Trying to bring more detail into understanding the wireless hardware

I'd actually use 16 bit vertices for collisions, if they have enough significant digits to represent your data (which it sounds like they do).

Having to shift the coordinates up a few bits is a small price to pay for reducing cache misses. Storing the data in 16 bits doesn't mean you have to use 16-bit calculations: LDRH automatically gives you a 32-bit value with zeros in the high bits.

Still, you'll want to store your collision data separately, and look for other ways to save memory if at all possible. Collision and rendering have very different needs in terms of optimizing your data structures.

zeruda wrote:

You are not "losing" anything. You're models are in 16 bit anyway. All that is happening is that you are gaining 32 bit accuracy for your calculations.

What I meant is when loading the models for collisions, converting vertices into 1.7.24 gives a more accurate value than 1.3.12. But since display lists are 1.3.12, then storing vertices for collisions also in 1.3.12 makes sense.

zeruda wrote:

Yes there is a cost for storing vertices twice but that happens anyway as soon as you start using display lists(which is the way to go 3D). And actually I store the list of vertices in 32 bit for collisions,(which triples the cost). But if you store them as 16 bit, then there will be a cpu cost in converting them to 32 bit.

Yup thats what I meant, storing vertices as 16bit vs 32bit. I don't think the cpu cost is too great, as I only test a few triangles per frame, most of them are discarded using aabb vs obb tests with my octree.

sajiimori wrote:

Still, you'll want to store your collision data separately, and look for other ways to save memory if at all possible. Collision and rendering have very different needs in terms of optimizing your data structures.

At the moment I have a *.obj loader, each model can either be set for collisions, rendering, or both. This obj loader stores an array of vertices, normals, tex coords, and polygons which have indices to the array of vertices. What data structure would you suggest I use for collision data?

Hmm, I can feel a big rewrite coming on :(

<(Slightly) Offtopic>

Speaking of which, what point-in-triangle-test are you guys using?

There are some basic techniques, e.g. ">/=180? total inner angles", "three edge planes" and others.

I'm just curious, as for lightmap raytracing (offline on a PC) I use the "180?" test and it's not very accurate and simply doesn't seem to be "teh h4k", yet still easy to implement...
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

Well I don't actually use a point-in-triangle test, I just check whether it's intersected or not without finding the point. I use the separating axis test for obb vs triangle, which is great because usually it fails on the first condition so it's a quick out.

Once I find it intersects, I calculate how deep into the triangle it is and just push it out.

I've come across another problem. When drawing really small objects with the camera right up close to them, they seem to jerk when moving the camera/rotating them. My ship's model is now 0.15*0.07*0.30 units long, so in 1.3.12 format they should be represented quite accurately since the smallest figure we can represent is around 0.0002. Any clue as to why this is? It doesn't happen if I use the original sized model (well it does but very minutely it's not even noticeable except to me since I was looking for it).

Rajveer wrote:

I've come across another problem. When drawing really small objects with the camera right up close to them, they seem to jerk when moving the camera/rotating them.

maybe it the fixpoint sin/cos function which are less accurate then sinf/cosf OR you move your camera even when the velocity of the camera is very small

I've settled on a scale of about 10 units for the height of a typical character. I used 40 on my first game which led to overflow troubles, and 5 on my second game which led to precision troubles.

As for formats, I'd store models for rendering as display lists, and models for collision in a super-compact pre-sorted format that can easily be traversed to quickly find nearby parts of the world.

For instance, if you're going to use an octree of triangles for collision, your collision binary should contain the octree itself, so you can just load the binary from the ROM and use it immediately. This opens up more opportunities for optimization at build time.

Taken to an extreme, you can preprocess your collision data so heavily that the resulting binary describes the operations required to do a collision test, rather than a structure descripting the contents of the world. You can eliminate polys from the collision data altogether. :)

Rajveer wrote:

Well I don't actually use a point-in-triangle test, I just check whether it's intersected or not without finding the point. I use the separating axis test for obb vs triangle, which is great because usually it fails on the first condition so it's a quick out.

Once I find it intersects, I calculate how deep into the triangle it is and just push it out.

Sounds intersting....source code or snippet?

I use this for a XZ-PointInTriangle Test

Code:

static inline Fixed Cross2D(Vec3 a,Vec3 b) {
return a.z * b.x -a.x *b.z;
}

//for counterclockwise OR clockwise triangles
static bool InsideTriangle2D_(Vec3 vIntersection, Vec3 *Poly) {
Fixed ab= Cross2D(vIntersection- Poly[0], Poly[1]-Poly[0]);
Fixed cb= Cross2D(vIntersection- Poly[1], Poly[2]-Poly[1]);
if(Fixed::sign(ab)!=Fixed::sign(cb)) return false;
Fixed ac= Cross2D(vIntersection- Poly[2], Poly[0]-Poly[2]);
if(Fixed::sign(ab)!=Fixed::sign(ac)) return false;

return true;

}

//if you have counterclockwise triangles the Sign tests are not needed
//then you have just if(Cross2d()<0 ) return false

Rajveer wrote:

Once I find it intersects, I calculate how deep into the triangle it is and just push it out.

I am currently on this last bit myself. Do you just move the BB out a calculated amount along the triangle's plane's unit normal? Or do you find the smallest movement along one of the other sep axes? Or perhaps there's another way? Just finding the smallest x,y,z coord movements?

For my swept obb vs tri, it goes a lil something like this:

Code:

int ColTestSweptOBBTri(triangle, obb, moveVector) //Returns intersect time
{
//First test max possible obb's aabb vs tri's aabb (I store this max poss aabb so no need to calculate each frame)

//Then start SAT tests:
//1st: tri's normal
axis = tri.normal;

speed = dot(axis, moveVector);
tri_Range = ColProjectTriOntoAxis(axis, tri)
obb_Range = ColProjectOBBOntoAxis(axis, obb)

if( ColTestIsSeparated(tri_Range, obb_Range, speed, t_Max, &t_First, &t_Last) ) return NO_INTERSECT;

//2nd: 3 test using OBB's 3 axes

//3rd: 9 tests using cross between tri's edges and obb's normals
...
return intersectTime;
}

ritz: I calculate how deep the obb is within the triangle and move it out that much (just dotting obb and tri onto tri's normal again and finding largest depth i.e. tri_Range.max - obb_Range.min). But my collision method is recursive, so if I find an intersect it calls itself again until it doesn't intersect anything. This way you handle contact with multiple triangles and it gets pushed out of them all.

Hmm I think I will look at rewriting my whole project (again!) just for the sake of creating a good, solid engine which I can use again for other 3D projects. Everything from collisions to...everything...revolves around models as *.obj files.

So sajiimori I should treat *.obj or whatever model file only as an intermediate file, which I then use to create display lists and collision files, using pc-side programs? At least I can use half of my current code to create said programs! :D

I find the preprocessed collision data part interesting, do you have any links or resources that I can use to help build my knowledge on collision optimisations, esp with data structures such as octrees?

This rewrite would be a good chance for me to rethink my approach to huge levels also, the scale I will be using, the accuracy I will use for collisions, and how to solve the problem with small models appearing jerky. All in all a learning experience!

Yes, I'd use the data exported from your 3D app as an intermediate format that gets compiled into optimized binaries. The intermediate format should be oriented toward easy parsing and human-readability, and it can contain liberal amounts of metadata or whatever might be useful to the build tools.

This book is pretty good, and up-to-date:

http://www.amazon.com/dp/1558607323

Many of the techniques are beyond the scope of typical DS games, but it has a lot of good ideas about space and model partitioning.

Autopartitioning BSP (as used by Quake) is an example of preprocessing the input data so heavily that the resulting data doesn't contain any polys: you just test against a series of planes, and when you're behind all the planes for a given convex solid volume, then you're intersecting it. This is what I do at work, and it's the most efficient approach I know of.

Rajveer wrote:

I calculate how deep the obb is within the triangle and move it out that much (just dotting obb and tri onto tri's normal again and finding largest depth i.e. tri_Range.max - obb_Range.min).

Thanks for the insight!

Rajveer wrote:

But my collision method is recursive, so if I find an intersect it calls itself again until it doesn't intersect anything. This way you handle contact with multiple triangles and it gets pushed out of them all.

Just as long as you avoid endless recursion where one poly pushes the object into a second poly and the second poly pushes it back into collision with the first poly. Heh.

Quote:

I've come across another problem. When drawing really small objects with the camera right up close to them, they seem to jerk when moving the camera/rotating them.

Try printing out the values that you are rotating the camera by. Also try a basic incrementing of the camera x position, that should come out smooth and you can go from there.

sajiimori wrote:

Having to shift the coordinates up a few bits is a small price to pay for reducing cache misses. Storing the data in 16 bits doesn't mean you have to use 16-bit calculations: LDRH automatically gives you a 32-bit value with zeros in the high bits.

Didn't know that, quite interesting. What is also interesting having done some more research is that the Arm instruction set can fold bit shifts into instructions. So that would mean you could load a 16 bit value into a register, converted to 32 bit and then bit shift up to a desired precision all in a single cycle instruction. So if it was a 1.3.12 value, you could bitshift left 6. That would give a 9.6 value being stored in a 1.19.12 format. Thus both divisions and multiplications of two numbers would fit perfectly.

Yeah after 5 bumps I simply decide to stop moving.

You could just scan (within an arc relative to the normal you want to push along) the environment for a valid position a few units away, like RWS have done in Postal2: <*POP*> "Got stuck? Stay away from here!" ;^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.

gbadev.org forum archive

DS development > 32-bit arithmetic precision

#159927 - Rajveer - Tue Jul 08, 2008 2:11 pm

#159928 - silent_code - Tue Jul 08, 2008 2:26 pm

#159929 - TwentySeven - Tue Jul 08, 2008 2:32 pm

#159931 - Maxxie - Tue Jul 08, 2008 2:45 pm

#159933 - elhobbs - Tue Jul 08, 2008 3:53 pm

#159935 - Rajveer - Tue Jul 08, 2008 4:21 pm

#159937 - silent_code - Tue Jul 08, 2008 5:03 pm

#159952 - a128 - Tue Jul 08, 2008 9:00 pm

#159953 - a128 - Tue Jul 08, 2008 9:02 pm

#159960 - ritz - Tue Jul 08, 2008 9:22 pm

#159961 - Maxxie - Tue Jul 08, 2008 9:27 pm

#159962 - ritz - Tue Jul 08, 2008 9:28 pm

#159966 - Miked0801 - Tue Jul 08, 2008 9:54 pm

#159969 - ritz - Tue Jul 08, 2008 10:05 pm

#159995 - sajiimori - Wed Jul 09, 2008 3:21 am

#160003 - ritz - Wed Jul 09, 2008 6:03 am

#160019 - Rajveer - Wed Jul 09, 2008 10:44 am

#160041 - sajiimori - Wed Jul 09, 2008 6:14 pm

#160086 - Rajveer - Thu Jul 10, 2008 9:52 am

#160120 - a128 - Thu Jul 10, 2008 5:21 pm

#160125 - sajiimori - Thu Jul 10, 2008 6:13 pm

#160193 - Rajveer - Fri Jul 11, 2008 10:48 am

#160222 - zeruda - Sat Jul 12, 2008 5:34 am

#160225 - sajiimori - Sat Jul 12, 2008 7:11 am

#160226 - a128 - Sat Jul 12, 2008 8:02 am

#160247 - Rajveer - Sat Jul 12, 2008 8:32 pm

#160250 - zeruda - Sat Jul 12, 2008 9:12 pm

#160252 - Maxxie - Sat Jul 12, 2008 9:27 pm

#160265 - sajiimori - Sun Jul 13, 2008 1:29 am

#160277 - Rajveer - Sun Jul 13, 2008 1:53 pm

#160284 - silent_code - Sun Jul 13, 2008 5:21 pm

#160288 - Rajveer - Sun Jul 13, 2008 6:28 pm

#160304 - a128 - Sun Jul 13, 2008 9:04 pm

#160336 - sajiimori - Mon Jul 14, 2008 3:47 am

#160342 - a128 - Mon Jul 14, 2008 8:56 am

#160374 - ritz - Mon Jul 14, 2008 5:55 pm

#160385 - Rajveer - Mon Jul 14, 2008 10:32 pm

#160393 - sajiimori - Mon Jul 14, 2008 11:36 pm

#160421 - ritz - Tue Jul 15, 2008 2:18 pm

#160522 - zeruda - Thu Jul 17, 2008 6:49 am

#160531 - TwentySeven - Thu Jul 17, 2008 9:58 am

#160534 - silent_code - Thu Jul 17, 2008 10:10 am