#171888 - ritz - Thu Dec 31, 2009 3:37 pm
Using the reciprocal instead of division is well known, but I thought I'd post my DS version of it for those who are interested. Compared to the libnds normalizef32(), the code below is more than 2x faster on the hardware:
I'm posting here instead of supplying a patch because there is a very tiny precision hit using this method.
I actually played around with that magic inverse sqrt hack but it was only a tad faster. The hardware sqrt on the DS is very, very good. Here's my (unused) inverse sqrt code for fun:
This would be a lot faster if I had the time/smarts to figure out using fixed instead of floats for this (there's actually code in Clutter that figured this out to some degree). Obviously, f32tofloat() is dragging. Even so, this code is really quite fast.
Anyway, happy new year to all :) Don't drink and drive!
Code: |
STATIC_INL
void v_fnormalizef32 (int32* a) { int32 m; m = sqrtf32(dotf32(a,a)); // get reciprocal, not using divf32() // DIV_32_32 is half the clks: numerator = 1 REG_DIVCNT = DIV_32_32; while (REG_DIVCNT & DIV_BUSY) ; REG_DIV_NUMER_L = 4096 << 12; REG_DIV_DENOM_L = m; while (REG_DIVCNT & DIV_BUSY) ; m = REG_DIV_RESULT_L; // multiply reciprocal instead of 3 divides a[0] = mulf32(a[0],m); a[1] = mulf32(a[1],m); a[2] = mulf32(a[2],m); } |
I'm posting here instead of supplying a patch because there is a very tiny precision hit using this method.
I actually played around with that magic inverse sqrt hack but it was only a tad faster. The hardware sqrt on the DS is very, very good. Here's my (unused) inverse sqrt code for fun:
Code: |
STATIC_INL
int32 v_finvsqrtf32 (int32 f32x) { union { float f; int32 i; } u; u.f = f32tofloat(f32x); u.i = 0x5f375a86 - (u.i >> 1); int32 uff32 = floattof32(u.f); return mulf32(uff32,(6144-mulf32((f32x>>1),mulf32(uff32,uff32)))); } |
This would be a lot faster if I had the time/smarts to figure out using fixed instead of floats for this (there's actually code in Clutter that figured this out to some degree). Obviously, f32tofloat() is dragging. Even so, this code is really quite fast.
Anyway, happy new year to all :) Don't drink and drive!