gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > Multiply 2 N.M fixedpoint values without needing 2N+2M bits?

#120591 - sammyjojo - Sun Mar 04, 2007 10:35 pm

Okay so from the way I understand it, if you have a two fixed-point numbers in N.M format (we'll just assume that the N and M are the same for both numbers) and you mutiply them together, you'll get a fixed-point number in 2N.2M format. For example if I have a 16.16 number and I multiply it with another 16.16 number I'll get a 32.32 number.

I figured out a way to multiple two N.M numbers without needed a 32.32 inbetween number, but it takes too many operations...

Code:

int fractionalMask   = ~(~0 << M);
int magnitude      = value0 >> M;
int rightMagnitude   = value1 >> M;
int fractional      = value0 & fractionalMask;
int rightFractional   = value1 & fractionalMask;

int result = ((magnitude * rightMagnitude) << M)
          + (magnitude * rightFractional)
          + (fractional * rightMagnitude)
          + ((fractional * rightFractional) >> M);


or I can cast the variable to a 8-byte one for the multiplication and then shift down and cast it back to 4-bytes...

Code:

int result = static_cast<int>((static_cast<long long>(value0) * static_cast<long long>(value1)) >> M);


Are these my only alternatives or is a there a better way that I'm just thinking of to where I don't need an intermediate 8-byte value?

This is for some fixed-point stuff that I working on the for the DS, so I was also wondering casting to a long long for intermediate step even work on that system? Being a 32-bit system would that just mean it could access a 64-bit value by taking 2 accesses to get 2 32-bit values that it would magically combine and figure out a way to work with it as a 64-bit value internally? So roughly working with 8-byte values instead of 4-byte ones would take twice as long or am I just completely thinking about this in the wrong way?

#120593 - keldon - Sun Mar 04, 2007 10:56 pm

Casting to a long long is fine in this case.

#120637 - Ant6n - Mon Mar 05, 2007 3:12 am

i.e. like
static inline fixed fmul(fixed a,fixed b){
return (((long long)a * (long long)b)>>16);
}

#120644 - tepples - Mon Mar 05, 2007 3:49 am

But if you put that into Thumb code, it will expand horribly. Thumb doesn't have the extra instructions for long long, as I found out while optimizing the graphics engine in TOD.

On the DS, you can use ARM instructions for a lot more code than you could on the GBA due to the cache.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#120708 - poslundc - Mon Mar 05, 2007 6:43 pm

You can use regular 32-bit multiply so long as you're careful not to overflow your values and can handle the loss in precision.

Say for example you have two numbers A and B that are both 16.16.

Code:
result = (A >> 8) * (B >> 8);


This will provide you with a 16.16 result, with no risk of overflow, but loss of half the precision in both terms.

Code:
result = ((A >> 4) * (B >> 4)) >> 8;


This provides you with a 16.16 result with only a loss of a quarter of precision of the terms, but can overflow if either term is larger than 256.

Obviously wrangling these various possible combinations can be a headache and lead to all kinds of obfuscation of code.

For most gaming applications, I'd say loss of half your precision in the terms (especially if your numbers are 16.16) is preferable most of the time to running the risk of overflow, so for a general fixed multiply operation I'd be more likely to go with something like the former, and special case the few instances where loss of precision is a noticeable detriment to gameplay.

(And as was already mentioned, on the DS you're probably better off staying in ARM and just doing 64-bit multiply on account of the cache.)

Dan.

#120773 - Ant6n - Tue Mar 06, 2007 12:38 am

tepples wrote:
But if you put that into Thumb code, it will expand horribly. Thumb doesn't have the extra instructions for long long, as I found out while optimizing the graphics engine in TOD.

On the DS, you can use ARM instructions for a lot more code than you could on the GBA due to the cache.


ok, then how about

Code:

static inline u32 mul(u32 a, u32 b) {
   #if   defined   ( __thumb__ )
       do 4 multiplies
   #else
       do long long multiply
   #endif
}