gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > got sine?

#114007 - Ant6n - Tue Jan 02, 2007 6:21 pm

hi,
I needed sin,cos, somehow wasnt there, and decided that i'll implement it myself. Since I floats are slow and I like fixed numbers; I calculate sin and cos with 16.16 numbers, with a circle rotation being 256.0. I use 3rd degree approximations (which I got from excel, since matlab gave me beef). I only checked at a couple of points since I dont have a linux environment around during vacation to test the actual precision; but it should be somewhere 10~15 bits (its exact on n*90?). I think this is usable for most game-related activities, and not too bad considering there is no LUT at all (with a small one precision could be increased, but meh). the whole thing could be done faster in asm, especially with multiply adds; right now i have to shift all the time to keep results of multiplies within the 4.3e9 limit.

Code:

typedef s32 fixed; //16.16 fixed point number

//aux
//sin256 (360? -> 256), in range 0-32 (0?-45?)
//gives 16.16 bit fixed number, should have 12~15 bit precision (??)
inline static s32 sin32(u32 x){
   u32 x2 = ((x>>5)*(x>>6));
   return (((x>>5)*51529)>>16)           //51589.9392   x
        - (((x2>>8)*(152))>>22)          //-152.001577  x^2
        - (((x2>>15)*((x*1221)>>16))>>19)//-1224.065679 x^3
        ;
}
//cos256 (360? -> 256), in range 0-32 (0?-45?)
//gives 16.16 bit fixed number, should have 12~15 bit precision (??)
inline static s32 cos32(u32 x){
    u32 x2 = ((x>>5)*(x>>6));
    return  (((x>>4)*22414)>>24)             //22414.36
          - (((x2>>11)*2659)>>17)            //-2659.188736
          + (((x2>>16)*((x*1894)>>15))>>21)  //1887.638127
          + 65536;
}

//aux2
//cos256,sin256 as fsin as fcos (360? -> 256)
//gives 16.16 bit fixed number, should have 10~15 bit precision (??)
fixed fcos(fixed x){
    s32 neg = (x & 0x800000);// neg true if x [128..256]
    if (x & 0x400000){       // if x is in range [64..128]
      x = 128*65536-1-x;     // symmetry trick, compensate for offbyone-error
      neg = neg^0x800000;    // negate neg
    }
    x = x & 0x3fffff;        // get x in range (0-64)*2^16
    s32 y;
    if (x & 0x200000)        // if x is in range [32..64]
      y = sin32(0x400000-x);
    else
      y = cos32(x);
    if (neg) return -y;      // return
    return y;
}
fixed fsin(fixed x){
    return fcos(x-64*65536); // not optimized anyway....
}

#114013 - Miked0801 - Tue Jan 02, 2007 8:03 pm

Ok, using a mathemaical (Taylor?) series to calc sin/cos for you. Off-hand. I'd say coding this in asm won't help much. All you are doing is shifts, muls, and adds.

That said, algebraically, you could make things clearer.

x>>5 * x>> 6 = x/32 * x/64 = (x*x) / 2048 = (x*x) >> 11 which is 1 less shift, but may lose precision.

Similiar tricks can be had for the other muls/shifts.

BUT! A lookup table (even stored in ROM) is going to be much, much faster than the best this code could ever do. We aren't on a modern PC system here where memory bus wait times kill you.

#114015 - Ant6n - Tue Jan 02, 2007 8:29 pm

no, no taylor. its perfect when using infinitely many terms, every term is an error correction to the previous terms. trig taylors use odd (sin) or even (cos) powers only. So I figure if I use powers 0-3 I can get more error correcting terms in there, and I can cook up my own because I am only in range 0-45 deg.
I know the algebra isnt clear, its a hack. if I'd use (x*x)>>11, then the biggest possible number for x, 32.0 (32*2^16), would produce 2^42, which doesnt fit in a 32 bit number. all the wierd shifts are there to keep the maximum possible precision w/o overflow.
A LUT (only lut w/o interpolation etc) would need several KB to get similar precision in general, and for small numbers the precision would be really bad. In multiboot people don't have that much space

#114017 - Miked0801 - Tue Jan 02, 2007 9:53 pm

A Lut at 32 bits of accuracy would take 64 entries * 4 bytes = 256 bytes of memory plus some code to flip on sectors. What am I missing here?

As you are flipping at 45 degrees, you could reduce this even further. I'm not questioning the usefulness of extremely efficient memory usage as I come from ye olde GB/DMG land where memory/storage was precious to the nibble. But, your code properly compiled would probably take more ROM/RAM space than a simple lookup table.

Another thought is, just how much precision do you really need? Floats/doubles give only very limited precision as it is and fixed point math tends to nibble your precision every time you need to anything beyond and add/subtract.

#114027 - Ant6n - Wed Jan 03, 2007 12:40 am

When I am talking about precision I mean how many bits of the result are the same as the result of the approximation function, counting from the first bit that is actually set. This is of course difficult to measure for small numbers, since we are talking fixed point here. I.e., if you have a LUT of 64 values, and you want sin(0.499), then with a LUT you round down and look up sin(0), which is going to give you 0. The actual value should be 0.00872. Since one value is 0, it is difficult to quantify the error. So lets assume you want sin(1.499), then you get sin(1)=0.017, when you actually want 0.026. Here you are ~50 % off. So only the first bit is valid, meaning your precision is 1. now lets take fsin(1.499 *(256/360)*2^16) = fsin(60538) = 0.023. Now you are 15% off, which is like 3 bit precision. I think the problem here is that all the shifts before the multiplies screw up the precision of the argument (thats why i say i'd be better to use asm with mla in 64 bit).
If you take sin(20.499), you'd get 0.350191 (actual), 0.3420 (LUT), 0.350159 (poly approx.), which is 5.4 vs 13.4 bits of precision.

In my first post I used "precision" as how many bits are the same as the actual result counting from the decimal (radix) point, which lead me to believe the precision is 10~15 (I now realize for some special cases its smaller, i.e. for sin(1.5) its only 8.3).
The LUT would have precision 6.8~14.7~16, whereas the 6.8 is for the argument 0.5, 14.7 is for 89.5 (where the whole off by 0.5 business causes the smallest error), and 16 is on all naturals (since we have a 16.16 number).
This doesn't look like much of a difference, but if you start looking at all possible numbers, the difference becomes quite obvious.

#114031 - Miked0801 - Wed Jan 03, 2007 1:26 am

Ok, you are looking for in-between degree values. Got it. Never had need for anywhere near that sort of theta precision on anything I've ever worked on. :)

Go get'em!

#114032 - Ant6n - Wed Jan 03, 2007 1:32 am

true. oh well. I needed this when porting some raycaster from visualham to devkitpro to see whether it optimizes better, but i guess i could've just always rounded the viewvector to the next natural angle.
Sometimes people ask about trigs in this forum, so the "search" functions oghta help now.
besides that, I think that the standard libs should have highly optimized fixed point math functions.

#114034 - kusma - Wed Jan 03, 2007 1:46 am

My sin-implementation has a LUT of 129 16 bit elements, and maintains a precision of 14 bits for all fixed16 input values (period of 1 in fixed16). It uses simple linear interpolation, but a higher degree of interpolation could be used to greatly reduce the LUT if needed. I haven't needed that yet, though, I'm happy with 258 bytes of ROM usage.

[edit: calculated too much space usage ;)]

#114035 - Peter - Wed Jan 03, 2007 2:39 am

kusma wrote:
My sin-implementation has a LUT of 129 16 bit elements, and maintains a precision of 14 bits for all fixed16 input values

hmh, the sin-table I used in a rasterizer project was pretty huge. I used a precision of 14bits (started with 20.12, but the objects started to twitch when they were far away so I increased the precision). The array had 1<<14 int elements but I didn't see the need to size optimize it.

What is the advantage of having a small sin-table? Plus, sin/cos don't really need to be the fastest routines on earth and the code above looks not critical to me, since basically you just use them when setting up some matrices (except you work with dynamic geometry) or do I make a mistake here?

#114036 - kusma - Wed Jan 03, 2007 2:44 am

Peter wrote:

What is the advantage of having a small sin-table?

Depends on the system. On the GBA I would assume people care about the size to keep applications below multiboot-sizes. For cached systems, cache-hit-rates can be a factor. For hardware-implementations, die-area is important to keep in mind.

Peter wrote:

Plus, sin/cos don't really need to be the fastest routines on earth and the code above looks not critical to me, since basically you just use them when setting up some matrices (except you work with dynamic geometry) or do I make a mistake here?


You are absolutely right, performance on sin/cos is almost never an issue in reality unless something _really_ stupid has been done ;)

#114037 - Ant6n - Wed Jan 03, 2007 2:49 am

...or something really wierd. I think that in some cases trigonometry can do interesting things, under simplifying assumptions.
I just started this thing to see how good one could do it without any LUT at all, I looked at the asm ouput, its not very fast.

#114040 - kusma - Wed Jan 03, 2007 3:37 am

Ant6n wrote:
...or something really wierd. I think that in some cases trigonometry can do interesting things, under simplifying assumptions.
I just started this thing to see how good one could do it without any LUT at all, I looked at the asm ouput, its not very fast.


nope, you have a lot of arithmetic there. especially the multiplications can be tricky to get fast. but it's a fun little attempt ;)