gbadev.org forum archive

I've got a project that compiles in VC++ for Windows and on the DS. Some pre-game calculations are done in floating point for accuracy, but I'm finding that I'm getting small differences in the results across the platforms, which add up. Are there any tips that you guys can give for keeping the floating point precision the same?

Bit of a vague question I know as I'm not showing any source, but just wondering if there's something obvious that I should be doing first.

x86 has an internal precision of 80 bits on it's floating point registers, and this might be what hits you. Try setting the floating point model to "strict" under Project Settings -> Configuration Properties -> C/C++ -> Code Generation in Visual Studio (this goes for Visual Studio 2008). It might be something else, but it's worth a shot ;)

For DS, I would say don't use floats, use fixed point numbers. You control the precision better and processing them requires much less CPU time.

kusma: Thanks for the suggestion, I tried changing the floating point model to both strict and fast, but am still getting different values unfortunately.

Miked0801: CPU time isn't an issue as this is for a level editor which runs on both platforms, as long as it generates the same data on either platform I'm happy! Which I guess is actually an argument for fixed point arithmetic as it should be platform agnostic and produce the same data. However I don't like leaving a problem unsolved, so I'd like to fix this, it's all a learning experience for me :)

So just as a test, I wrote:

Code:

float tempFloat = 0.007f;
printf("tempFloat %.40f", tempFloat);

What's outputted:

Win: tempFloat 0.0070000002160668373000000000000000000000
DS : tempFloat 0.0070000002160668373107910156250000000000

I'm in no way worried about that much precision, but I'm not understanding why they're not representing 0.007f in the same way, if float is 32bits on both systems and I'm forcing a strict floating point model?

If I'm not mistaken, you only use strict OR fast, not both [the last one used takes preference so if you use "-a -b" then 'b' would take preference] as these are mutually exclusive when it comes to speed vs. accuracy.

Sorry I wrote that a bit confusingly, I meant I tried precise, then strict and finally fast.

Hm, and they all output the same data? Try the following:

Code:

printf("raw data = %08X\n", *(u32*)(&tempFloat));

If the data differs from Windows to the DS, there's something else going on.

Rajveer wrote:

What's outputted:

Win: tempFloat 0.0070000002160668373000000000000000000000
DS : tempFloat 0.0070000002160668373107910156250000000000

Try outputting them in hex, maybe they are equal, just got different output in decimal conversion...

Rajveer wrote:

So just as a test, I wrote:

Code:

float tempFloat = 0.007f;
printf("tempFloat %.40f", tempFloat);

What's outputted:

Win: tempFloat 0.0070000002160668373000000000000000000000
DS : tempFloat 0.0070000002160668373107910156250000000000

I'm in no way worried about that much precision, but I'm not understanding why they're not representing 0.007f in the same way, if float is 32bits on both systems and I'm forcing a strict floating point model?

It could simply have to do with the way that printf prints the numbers, rather than a difference in the float itself. Even a double will only have about 16 decimal places. If you ask it to print more, it could just be printing whatever's in the bytes following the float. Or one printf implementation could try to properly extend the binary representation into decimal, and the other could just fill the rest with zeroes.

As Ruben suggests, comparing the bytes making up the floats might be more fruitful.

Ah excellent, the hexadecimal values are the same. I also went through one of the "troublesome" algorithms and found that the values inbetween calculations are also the same. Thanks for that guys!

Interestingly, I did find a bug. At the end of the algorithm I add up a bunch of values, which is where it looks like there was a discrepency between the platforms:

Code:

result[Z] = (float)((OneMinusTCubed*((float)spline->knotArray[knot].pos[Z]))/4096
         + (ThreeXOneMinusTSquaredXT*((float)spline->knotArray[knot].outVec[Z]))/4096
         + (ThreeXOneMinusTXTSquared*((float)spline->knotArray[nextKnot].inVec[Z]))/4096
         + (TCubed*((float)spline->knotArray[nextKnot].pos[Z]))/4096);

result[Z] in hex is:

Win: C5AFBD82
DS: C5AFBD81

If I add the values one by one as follows:

Code:

result[Z] = (float)((OneMinusTCubed*((float)spline->knotArray[knot].pos[Z]))/4096);
result[Z] += (float)((ThreeXOneMinusTSquaredXT*((float)spline->knotArray[knot].outVec[Z]))/4096);
result[Z] += (float)((ThreeXOneMinusTXTSquared*((float)spline->knotArray[nextKnot].inVec[Z]))/4096);
result[Z] += (float)((TCubed*((float)spline->knotArray[nextKnot].pos[Z]))/4096);

Both the DS and Win produce C5AFBD81. Would this likely be due to internal rounding being different on the platforms or something?

It could be rounding and/or numerical instability.

The addition itself will be done at a higher precision (64 or 80 bit?). In the former case, the intermediates will be kept at the higher precision, so should be more accurate. In the latter case, each intermediate is cast down to floats, which will either remove or round-off the excess bits.

It's similar to this:

Code:

int i= 0.5 + 0.5 + 0.5; // 1.5 -> 1

vs

int i= 0.5; // 0
i += 0.5; // 0
i += 0.5; // 0

So the way you add could influence the end result. If the PC uses larger intermediaries, some difference in the lower bits could be a likely result.

Also watch out for numerical instability.

You can get numerical instability if the range of the values differ a lot. In general, you'll want to add up all the smaller values first. For example, suppose you want to add 4, 2000, 4, 4, but you only have 3 significant figures.

Code:

4.00 4.00
2.00e3 + 4.00 +
---------- -----------
2.00e3 8.00
4.00 + 4.00 +
---------- vs ----------
2.00e3 12.0
4.00 + 2.00e3 +
---------- ----------
2.00e3 2.01e3

Note that in this case it's only a small difference, but when you start using other operations (particularly subtraction), this can severly fuck up your calculations.

Rajveer wrote:

Ah excellent, the hexadecimal values are the same. I also went through one of the "troublesome" algorithms and found that the values inbetween calculations are also the same. Thanks for that guys!

Interestingly, I did find a bug. At the end of the algorithm I add up a bunch of values, which is where it looks like there was a discrepency between the platforms:

Code:

result[Z] = (float)((OneMinusTCubed*((float)spline->knotArray[knot].pos[Z]))/4096
         + (ThreeXOneMinusTSquaredXT*((float)spline->knotArray[knot].outVec[Z]))/4096
         + (ThreeXOneMinusTXTSquared*((float)spline->knotArray[nextKnot].inVec[Z]))/4096
         + (TCubed*((float)spline->knotArray[nextKnot].pos[Z]))/4096);

Hmnn.... Parallel/Frenet Frames?
_________________
http://rel.betterwebber.com

Cearn: That's really interesting, definately something I'll keep in mind from now on. Thanks!

relminator: Bezier curve interpolation :)

Rajveer wrote:

Cearn: That's really interesting, definately something I'll keep in mind from now on. Thanks!

relminator: Bezier curve interpolation :)

You're trying to do time-constant beziers?

What I did was to sample it at a certain distance for each point on the curve. Load that sample and lerp from there.
_________________
http://rel.betterwebber.com

So did you have an array for each curve that maps t to distance along the curve or something?

Yep.

Code:

s32 Bezierf32( s32 p0, s32 p1, s32 p2, s32 p3, s32 t )
{

s32 bez;

// A = p1
// B = p0
// C = p3
// D = p2

s32 b = (1 << 12) - t;
s32 b2 = (b * b) >> 12;
s32 b3 = (b2 * b) >> 12;

s32 t2 = (t * t) >> 12;
s32 t3 = (t2 * t) >> 12;

s32 b2t = ( b2 * t ) >> 12;
s32 bt2 = ( b * t2 ) >> 12;

bez = p1 * b3 +
   ( (3 * p0 * b2t) ) +
   ( (3 * p3 * bt2) ) +
   p2 * t3;

return bez;

}

s32 BezierfDifferential( s32 p0, s32 p1, s32 p2, s32 p3, s32 t )
{

s32 _t2 = (t * t) >> 12;
s32 _2t = (2 * t);
s32 _3t2 = (3 * _t2);

s32 a = p0 * ( (1<<12) - (4*t) + _3t2 );
s32 b = p1 * ( _2t - (1<<12) - _t2 );
s32 c = p2 * _t2;
s32 d = p3 * ( _2t - _3t2 );

return 3 * ( (a + b + c + d) >> 12 );

}

inline s32 Bezier::GetT()
{

sassert(Lut, "Table not loaded!");

while( d > Lut[Index+1].d )
{
   Index += 1;
}

s32 d1 = Lut[Index].d;
s32 d2 = Lut[Index+1].d;

s32 t1 = Lut[Index].t;
s32 t2 = Lut[Index+1].t;

// s32 Lf32 = ( ((d - d1) << 12 ) / (d2 - d1));

s32 Lf32 = divf32( (d - d1), (d2 - d1) );
return ( t1 + ( ( Lf32 * (t2 - t1) ) >> 12 ) );

}

The array is made from a PC based on a tool.

For 256x192 screen a sample of 32 elements should be enough.
_________________
http://rel.betterwebber.com

Thanks for that. So Bezierf32() finds the point on the curve at time t for whichever dimension you pass into it, ::GetT() interpolates a distance between the samples and calculates the time t, what does BezierDifferential do?

Do you have any hints on a fast way to finding the closest point on a Bezier curve to another point? Currently I'm finding t by creating a line segment between the beginning and end, and finding the closest point on this line to this point. I work out t, and plug it in, but this doesn't work as well as I thought it would.

Reproducible floating points are in general a nightmare, due to the boneheaded design of the x87 fpu.

From what I've read, there are several measures you can take.
* Force all variables to be loaded to and from memory. This is generally what the slow math compiler option does. This reduces errors, however, there are still double rounding problems, so math is still not reproducible. Also, it's slow.
* Set the FPU precision to 64bit. This will prevent all normal errors. However, the exponent precision is still too big so you can get errors in some rare edge cases. One problem with this is that other programs and libraries (such as DirectX) will often reset the FPU precision so you constantly have to fix it.
* Use SSE. These have 64bit precision. However, they have no denormal support if that's important to you. It's also less portable.

The most foolproof solution of course is to use fixed point or arbitrary precision arithmetic. This gives better accuracy and perfect reproducibility at the expense of speed.

gbadev.org forum archive

C/C++ > Floating point across platforms

#175985 - Rajveer - Mon Mar 14, 2011 11:47 pm

#175987 - kusma - Tue Mar 15, 2011 9:21 am

#175988 - Miked0801 - Tue Mar 15, 2011 4:20 pm

#175989 - Rajveer - Tue Mar 15, 2011 9:22 pm

#175990 - Ruben - Tue Mar 15, 2011 9:49 pm

#175991 - Rajveer - Tue Mar 15, 2011 9:51 pm

#175993 - Ruben - Wed Mar 16, 2011 5:38 am

#175995 - sverx - Wed Mar 16, 2011 10:05 am

#175996 - Cearn - Wed Mar 16, 2011 10:41 am

#175998 - Rajveer - Wed Mar 16, 2011 1:49 pm

#175999 - Cearn - Wed Mar 16, 2011 3:37 pm

#176006 - relminator - Thu Mar 17, 2011 11:11 am

#176008 - Rajveer - Thu Mar 17, 2011 5:09 pm

#176010 - relminator - Fri Mar 18, 2011 5:52 am

#176018 - Rajveer - Mon Mar 21, 2011 1:25 pm

#176031 - relminator - Thu Mar 24, 2011 8:22 am

#176075 - Rajveer - Wed Mar 30, 2011 4:42 pm

#176614 - GLaDOS - Sat Aug 27, 2011 3:06 am