gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > InvSqrt

#160092 - ChronoDK - Thu Jul 10, 2008 2:35 pm

Hi,

I'm slowly porting a demo I made on pc, but I ran into a problem. I was using the Quake 3 InvSqrt function:

Code:

float CVector2D::InvSqrt(float x) {
   float xhalf = 0.5f * x;
   int i = *(int*)&x;
   i = 0x5f3759df - (i >> 1);
   x = *(float*)&i;
   x = x*(1.5f - xhalf*x*x);
   return x;
}


But it does not seem to work on DS. I'm guessing the float format is different. Is there some way to make this magic work?

#160095 - Maxxie - Thu Jul 10, 2008 2:50 pm

The float format is the same. The standrad is IEEE 754

The code however is some really dirty speedhack. Not really readable.

What is InvSqrt intended to do?
If it is the Squareroot of the inverse, then replace it with sqrt(1/x). (Surely slower, but float is slow on the DS anyways)

You should however get away from floats anyways, as they are caclulated via software emulation (not hardware as on PC) and thus hell of slow.
_________________
Trying to bring more detail into understanding the wireless hardware

#160098 - elhobbs - Thu Jul 10, 2008 2:56 pm

you could always use ;)

Code:

float CVector2D::InvSqrt(float x) {
    return 1.0/sqrt(x);
}

are you sure that it is not working? this is only one iteration of the newton method so it may not be as accurate as you require.

#160099 - elhobbs - Thu Jul 10, 2008 2:59 pm

Maxxie wrote:
What is InvSqrt intended to do?
If it is the Squareroot of the inverse, then replace it with sqrt(1/x).
in q3 it isued in calulations to normalize vectors so it the inverse of the square root not the square root of the inverse. so 1.0/sqrt(x).

#160100 - Maxxie - Thu Jul 10, 2008 3:05 pm

which is the same:

(1/x)^y = (1^y)/ (x^y) = 1 / (x^y)
with y = 0.5 for square root

:D ... and that newton iteration, yes that could be what i was missing on the reading, however it also simplifies a bit the different float parts. It shouldn't be valid for a lot of possible x values (i.e. negative x should result into intermixing the exponent with the sign (bitshifting one right), but then negatives arn't "rootable" anyways, but same happen on exponent/mantisse limit)
_________________
Trying to bring more detail into understanding the wireless hardware

#160105 - elhobbs - Thu Jul 10, 2008 3:22 pm

Maxxie wrote:
which is the same:

(1/x)^y = (1^y)/ (x^y) = 1 / (x^y)
with y = 0.5 for square root
hides head in shame...

if you were using the hardware sqrt though I am not sure that would give the same result, or would it?

#160107 - Maxxie - Thu Jul 10, 2008 3:29 pm

There might be slight differences at the end of the mantisse due to the different direction the roundings are influencing the result.

The acuracy should be the same however, as the inverse allways operates as far away from the neutral exponent (^1 -> exp=128) as the original
_________________
Trying to bring more detail into understanding the wireless hardware

#160108 - elhobbs - Thu Jul 10, 2008 3:41 pm

I meant the ds hardware sqrt

#160109 - ChronoDK - Thu Jul 10, 2008 3:48 pm

I'm using the InvSqrt in another function where I normalize a vector.

That hardware sqrt sounds nice, but that works with fixed point only right?

#160110 - elwing - Thu Jul 10, 2008 3:54 pm

ChronoDK wrote:
I'm using the InvSqrt in another function where I normalize a vector.

That hardware sqrt sounds nice, but that works with fixed point only right?


yes, but you should never use something that is not fixed point, the DS don't manage floating points, it's wholly emulated and DAMN slow...

#160112 - Maxxie - Thu Jul 10, 2008 3:58 pm

ah,

well since it is an 64bit integer sqrt i'd would allways take the expression that is >=1 out of 1/x or x. Multiply it as long with 4 until the i64 representation has none or only one leading zero left. After the hw sqrt convert back to float and divide as many times with 2 as you multiplied with 4 before.

Then apply the inverse if it is still missing (if you took x)

That should give the best accuracy on the ds hw sqrt.

if you keep within an exponent between (64-23) above ^1 (so from ^40 to ^1) you should have as good or better accuracy as with float emulation. Outside this range, your accuraly should be less or equal.

:edit: removed sign reserve in register width - mantisse width bit
:edit2: it's not needed to care about -exp, so only need ^+ range, and take the full 40 possible shifts for positives
_________________
Trying to bring more detail into understanding the wireless hardware

#160117 - Maxxie - Thu Jul 10, 2008 4:41 pm

Half-Pseudocode for the suggested method.

Code:

float SqrtAbove1(float x)
{
   /* the exp bits */
   signed char exp = (signed char) (*(int *)&x & 0x7F800000) ;   
   /* actual exponent is stored negative */
   exp = -exp ;
      
   /* if we have an exp <23 we have a fraction < 1, we move shift it until we have no such fraction anymore */
   /* since we have to reverse this operation later and can't do halfbit shifts, we align this correction at 2 */
   if (exp < 23)
   {
      expCorrection = (exp + 1) & ~1 ;
   } else
      expCorrection = 0 ;
      
   /* get all bits into the ordinal */
   u64 i = x * pow(2,expCorrection) ;

   /* occupy as much bits as possible, but only do 2-shifts, for later reverse */   
   int count = 0 ;
   while (!(i & 0xC000000000000000))
   {
      i <<= 2 ;
      count++ ;
   }
      
      
   /* do actual sqrt */
   SQRT_PARAM = i ;
   SQRTCNT  = 8001 ;
   while (SQRTCNT & 0x8000) ;
   i = SQRT_RESULT ;
      
   /* reverse shifts */
   i >>= count ;
      
   /* reverse expCorrection */
   x = (float)i / pow(2,(expCorrection >> 1) ;
   
   return x ;
}

float InvSqrt(float x)
{
   if (x<=0) return NaN ;
   if (x>=1)
   {
      x = SqrtAbove1(x)
      /* and apply Inverse */
      x = 1 / x ;
   } else
   {
      /* apply inverse first */
      x = SqrtAbove1(1/x) ;
   }
}


The shift/reverse shifting is based on

sqrt(x) = (sqrt(4) / 2) * sqrt(x) = sqrt(4*x) / 2
->
sqrt(x) = (sqrt(4^n) / 2^n) * sqrt(x) = sqrt(4^n*x) / 2^n


The * or / pow(2,...) in the float can be fastened by direct bit manipulating the exp part of the float.
_________________
Trying to bring more detail into understanding the wireless hardware