gbadev.org forum archive

G'day :)

I'm in the process of writing a Rigid-Body dynamics system for a game, but I'm having some troubles with speed.

The main one at the moment seems to be with Matrix multiplications ( and all other multiplications for that matter).

All my code is in Fixed Point format, using the libnds functions for mulf32, divf32, and sqrtf32 to utilize the hardware functionality.

I thought maybe it would be possible to use the hardware functions for matrix multiplication via the Geometry interface?

Looking over the NDSTech Wiki (here) I can see there are functions for multiplying matricies, but for the life of me, I cannot think of any way to actually use them.
It seems as though you have to first give it a series of 9 f32 values, then somehow specify a matrix to multiply that with. And then, how do you get the result out? It seems as though it stays in the hardware.

Any help with this would be appreciated.

Also, if anyone can suggest additional speed-ups (DS Hardware related) for the calculations primarily used in Physics, I would be very thankfull!
Area's of interest:
o sqrt function - for Normalizing the Quaternions & Vectors
o Dot Products
o Cross Products
o Grassmann products - Product of two Quaternions
o Quaternion to Matrix Conversion

Thank you :)

Here is some info about the operations I have already:

Quaternion to Matrix Conversion:
2 conversions from float to fixed
13 mulf32's
12 additions

Grassmann Product:
16 mulf32's
12 additions

Cross Products
6 mulf32's
3 additions

And, finally, some code for you to see what I've already done (all code posted is released under the GNU GPL):

Oh, and the 'SCALAR' type is defined as;
typedef f32 SCALAR;

Code:

// ******** Quaternion Class ********

// Quaternion to Matrix
const MATRIX QUATERNION::toMatrix() const{

SCALAR one = floatToFixed(1);
SCALAR two = floatToFixed(2);

SCALAR r2 = mulf32(two,r);
SCALAR x2 = mulf32(two,x);
SCALAR y2 = mulf32(two,y);
SCALAR z2 = mulf32(two,z);

SCALAR yy = mulf32(y2,y);
SCALAR xx = mulf32(x2,x);
SCALAR zz = mulf32(z2,z);

SCALAR xy = mulf32(x2,y);
SCALAR xz = mulf32(x2,z);
SCALAR xr = mulf32(r2,x);

SCALAR yr = mulf32(r2,y);
SCALAR yz = mulf32(y2,z);

SCALAR zr = mulf32(r2,z);

VECTOR c0 = VECTOR( one - yy - zz,
   xy + zr,
   xz - yr
                           );

VECTOR c1 = VECTOR( xy - zr,
   one - xx - zz,
   yz + xr
                           );

VECTOR c2 = VECTOR( xz + yr,
   yz - xr,
   one - xx - yy
                           );

return MATRIX(c0, c1, c2);
}

// grassmann product
const QUATERNION QUATERNION::operator * ( const QUATERNION& q ) const{
return QUATERNION( mulf32(r,q.r) - mulf32(x,q.x) - mulf32(y,q.y) - mulf32(z,q.z),
   mulf32(x,q.r) + mulf32(r,q.x) + mulf32(y,q.z) - mulf32(z,q.y),
   mulf32(y,q.r) + mulf32(r,q.y) + mulf32(z,q.x) - mulf32(x,q.z),
                              mulf32(z,q.r) + mulf32(r,q.z) + mulf32(x,q.y) - mulf32(y,q.x)
                           );

}

// cross product
const QUATERNION QUATERNION::cross( const QUATERNION& q ) const{
VECTOR result;
result.x = mulf32(y, q.z) - mulf32(q.y, z);
result.y = mulf32(z, q.x) - mulf32(q.z, x);
result.z = mulf32(x, q.y) - mulf32(q.x, y);
return QUATERNION( 0,
                              result.x,
   result.y,
   result.z
                           );
}

Code:

// ******** Matrix Class ********

// C is defined as an array 3-SCALAR vector with members x, y, z
// each element of C (ie, each set of 3-SCALARs) represents a column of the matrix

//post-multiply by a vector
const VECTOR MATRIX::operator * ( const VECTOR& v ) const{
return( C[0]*v.x + C[1]*v.y + C[2]*v.z );
}

//post-multiply by a matrix
const MATRIX MATRIX::operator * ( const MATRIX& m ) const{
return MATRIX( (*this) * m.C[0], (*this) * m.C[1], (*this) * m.C[2] );
}

_________________
Nintendo DS & Dominos :: DS Dominos
http://jt0.org

Returning matrices by value is pretty expensive. To get an idea of the work it involves, make a version that takes a destination pointer (ugly, I know), compile both versions using -O2 -S, and look at the differences in both the function and the code that calls it.

The same may be true of quaternions, to a lesser extent. Check the output for those as well.

Also check the output of toMatrix() to see if the compiler is optimizing out the work of constructing temporary vectors and copying them into the matrix. Compare the output to writing values straight into the matrix, without using the constructor that takes 3 vectors.

These functions are good candidates for ITCM.

I haven't said a word about math, but data shuffling operations are often just as important, and that includes reading opcodes from main memory.

On an unrelated note, a typesafe fixed-point class would clean things up a lot.

gbadev.org forum archive

DS development > Optimized (Physics based) Matrix Calculations

#91631 - JessTicular - Sat Jul 08, 2006 1:18 pm

#91684 - sajiimori - Sat Jul 08, 2006 7:22 pm