gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > Directional Fire

#167280 - headspin - Sat Mar 07, 2009 6:32 pm

Were looking at implementing directional fire into a game a friend and I are working on in asm. We have the following C code to work from.

To calculate the the angle from the alien to the ship

Code:
# Alien
x1 = 128
y1 = 128

# Ship
x2 = 132
y2 = 132

deltax = x2 - x1
deltay = y2 - y1

angle_rad = atan2(deltay,deltax)


And then to move the bullet towards the ship from the alien

Code:
# alien starting point
x0 = 1.0
y0 = 1.5

# theta is the angle (in radians) of the direction in which to move
theta = pi/6

# r is the distance to move
r = 2.0

deltax = r * cos(theta)
deltay = r * sin(theta)

# new point
x1 = x0 + deltax
y1 = y0 + deltay


I've found an implementation of atan2 in asm on these boards which we can use. But is there a way to do the calculation without atan2? Also avoiding the divide would be nice. So basically any suggestions for an algorithm that would be faster in arm asm.
_________________
Warhawk DS | Manic Miner: The Lost Levels | The Detective Game

#167281 - Miked0801 - Sat Mar 07, 2009 6:42 pm

atan and atan2 can be implemented as a table lookup. There are also a number of algorithms that approximate the function quickly.

Also, where's the divide?

#167284 - headspin - Sat Mar 07, 2009 7:13 pm

A lookup table, good idea. The thing is it's not like sin and cos where it takes one parameter to calculate so how big would the table need to be if X is 0-256 and Y is 0-192? Seems like it would be a rather large table.

Does anyone have a table we could use. I can write a program to generate one quite easily if I knew how it should be structured so it's easy to look up.

Sorry I realised the divide (pi/6) is not required but just part of the example. The theta is the same as angle_rad as calculated in the first code snippet.
_________________
Warhawk DS | Manic Miner: The Lost Levels | The Detective Game

#167291 - Ruben - Sun Mar 08, 2009 8:48 am

Actually, you don't need a massive table for sin/cos; you just have a quarter of a sine, and then do the reflexes manually. Eg..
Code:
//In a separate file calculated in GCC (not for ARM)

#define PI 3.1415926535897932384626433832795

#include <math.h>
#include <stdio.h>

int main(void) {
    FILE *myFile = fopen("sin.s", "wt"); //Omit the T in a Linux environment
    fprintf(myFile, ".section .rodata\n.align 2\n.global _SinTable\n\n_SinTable:\n");

    for(int i = 0; i < (512/4) ; i++) { //512 / 4 to get only a quarter (rescaling sine to a 0-511 range)
        unsigned int myVal = (unsigned int)(sin((float)i*2*PI / 512) * (1<<14) + 0.5);
        if(i&7) fprintf(myFile, ", 0x%04X", myVal);
        else fprintf(myFile, "\n.hword 0x%04X", myVal);
    }
    fclose(myFile);
}


Then you make your own sin/cos functions like this...
Code:
s32 Sine(s32 Theta) {
    extern s16 _SinTable[512/4];
   
    s32 Mult = 1, Offset = (512/4)-1;
    Theta &= 512-1; //Clamp it in range
   
    //Calculate reflexes
    if(Theta > (512/2)) Mult = -Mult;
    if((Theta&255) > (512/4)) Offset = 255;
   
    //Clamp theta in the range of the length of the LUT
    Theta &= (512/4)-1;
   
    //Return
    return (s32)(_SinTable[Offset-Theta] * Mult);
}
Just please don't use that code directly.. I'm not sure if it's correct.. I only just woke up and I have a headache so don't trust everything I say. >.>"

Oh and about atan2..
You can either use your own implementations or use the BIOS call.

#167313 - headspin - Sun Mar 08, 2009 8:38 pm

Thanks Ruben but I already have a sin and cos table. I'm interested in how to generate one for atan2.

P.S I don't think there is an atan2 BIOS function for the NDS only for the GBA.
_________________
Warhawk DS | Manic Miner: The Lost Levels | The Detective Game

#167318 - Cearn - Sun Mar 08, 2009 11:34 pm

Unless you need the angle for something else (like rotating a turret to a certain direction) you probably don't need the arctan. Just get the deltas and normalize that vector to the desired speed.
Code:
// NOTE: assuming 12bit fixed point here.
dx= x2-x1;
dy= y2-y1;
norm= divf32(speed, sqrtf32(dx*dx+dy*dy)>>6);

vx = dx*norm>>12;
vy= dy*norm>>12;
If you include the set-up and safeties, the NDS hardware division costs about 80 cycles. Unless you do several thousand of them per frame, I'm not sure if you should be that considered about it.

If you really do need an arctan, consider looking here. It can be done in roughly 100 cycles on the NDS even without trying very hard.

#167388 - Flash - Tue Mar 10, 2009 7:15 pm

Thanks to all for their help on this :)

We did get there in the end, though I had to keep all in toe with the rest of the sprite code (which is straight integer).

The workaround was to use the idea from cearn to create fixed point delays on the x/y axis and +/- speeds. This has worked a treat. it is perfectly accurate to the t!

Code:
         @ ok, now to work out where to shoot???
         @ r4/r5 = player X/Y
         @ r6/r7 = Alien X/Y
         @ r12 = shot speed
         mov r10,r12                  @ Store the X/Y speeds
         mov r11,r12                  @ we will need r12 later
         
         cmp r5,r7
         rsble r11,r11,#0
         suble r9,r7,r5
         subgt r9,r5,r7         
            cmp r4,r6
            rsble r10,r10,#0
            suble r8,r6,r4
            subgt r8,r4,r6
            cmp r8,r9
            bmi directOddQuad
               push {r0-r2}
               mov r0,r8               @ divide this number
               add r9,r12               @ we also need to divide by the SPEED
               mov r1,r9               @ by this number
                  bl divf32            @ r0=result 20.12   
               mov r9,r0               @ move the whole to r9
               mov r8,#0   
               pop {r0-r2}
            b directDone
            directOddQuad:
               push {r0-r2}
               mov r0,r9               @ divide this number
               add r8,r12               @ we also need to divide by the SPEED
               mov r1,r8               @ by this number
                  bl divf32            @ r0=result 20.12   
               mov r8,r0               @ move the whole to r8
               mov r9,#0   
               pop {r0-r2}
            b directDone         


So, the result is that r10,r11 hold the signed x/y speed and r8,r9 hold a 20.12 fixed point delay on the update of x/y.

It is not perfect, but it is quick and works a dream with the existing sprite code, so... I am very happy :)