#24234 - DiscoStew - Fri Jul 30, 2004 7:22 am
WOW!
This will be my first post in the assembly thread, and I hope to come back here many times as I will have an assembly class in college this coming semester.
Anyways, I've used the "Peter-Teichmann" unsigned divide algorithm in my projects, and it has worked really well for me. However, I did this through inline assembly.
Now I am trying to stray from using inline assembly as much as possible with using external pure asembly code and linking that into my c code. I does work, but now I am trying to add the signed divide function, and it isn't working. In VBA, it brings up "Unsupported ARM mode 00." I used the exact code that was supplied, which I'll show here with my own name for the function...
Code: |
@Asm_SDivide(s32 dividend, s32 divisor)
Asm_SDivide:
ands r3,r1,#0x80000000
rsbmi r1,r1,#0
eor r3,r3,r0,asr#1
cmp r0,#0
rsbmi r0,r0,#0
mov r12,r14
bl Asm_Divide
tst r3,#0x80000000
rsbne r0,r0,#0
tst r3,#0x40000000
rsbne r1,r1,#0
movs pc,r12 |
Now, being that this function uses the unsigned divide, I'm not sure which function has the error. The code is just like on his website, except for main label names and hexidecimal numbers (#0x80000000, not #&800000000). Can anyone help me? thx
PS: Be aware that I practically know no assembly except for a few things here and there, enough to where I got FDIV working as inline assembly through copy, paste, and minor editing.
_________________
DS - It's all about DiscoStew
Last edited by DiscoStew on Fri Jul 30, 2004 9:03 pm; edited 2 times in total
#24237 - FluBBa - Fri Jul 30, 2004 9:02 am
It would be nice to also see the "Asm_Divide" routine, if you happen to use r3 and/or r12 in that one.
Is your function called from Thumb mode? if so you should probably use "bx r12" instead of "movs 12" when exiting.
_________________
I probably suck, my not is a programmer.
#24238 - wintermute - Fri Jul 30, 2004 9:04 am
If you're calling the function from thumb code then you need an interworking variant of the return. Instead of "mov pc.lr" you need "bx lr" which returns the processor to the mode it was in when the call was made
#24251 - DiscoStew - Fri Jul 30, 2004 5:25 pm
Yes, I was calling it from Thumb. I made the change to "bx lr", and the error window didn't appear. However, I wasn't getting any return value from the function. Then I thought that before branching into the AsmDivide function, I stored whatever data that was in r14 into r12 (return address? I'm still learning), so before "bx lr", I added "mov r14,r12" to move the stored data rom r12 back into r14, and now it is working. Even though it does work now, is that how I should have done it?
_________________
DS - It's all about DiscoStew
#24253 - poslundc - Fri Jul 30, 2004 5:55 pm
LR is short for Link Register, and is a synonym for r14 (which is the link register).
So yes, you need to restore the value of r14 if it has changed since the beginning of your subroutine.
In fact, if you plan on calling your routine from C, you need to preserve all of the registers except r0-r3 and r12.
Other registers you generally shouldn't ever touch are r13 (SP - stack pointer) and r15 (PC - program counter).
Dan.
#24260 - DiscoStew - Fri Jul 30, 2004 9:00 pm
NOTE: the "quest" in "ArcTan2 quest" was meant to be "question"
Thanks for the info, but now I have another assembly problem that maybe someone could help me with. I found this ArcTan2 ASM function somewhere (most likely here in the forums due to a Divide BIOS call), but it doesn't work, or at least I don't know how to use it properly. Here it is... Code: |
@Asm_ArcTan2(s32 y, s32 x)
Asm_ArcTan2:
mov r3, r1 @backup
cmp r0, #0
blt _xlt0
movs r2,r1
rsbmi r2,r2,#0 @abs(y/r1)
add r1, r0, r2
mov r1, r1, lsr #8
cmp r1, #0
beq _error
sub r0, r0, r2
swi 0x60000
mov r2, #1280
sub r0,r2,r0
b _xnlt0
_xlt0:
movs r2,r1
rsbmi r2,r2,#0 @abs(y/r1)
sub r1, r2, r0
mov r1, r1, lsr #8
cmp r1, #0
beq _error
add r0, r0, r2
swi 0x60000
mov r2, #1792
sub r0,r2,r0
_xnlt0:
movs r3, r3
rsbmi r0,r0,#0 @invert if y<0
bx lr
_error:
mov r0,#0 @DIV BY 0 ERROR - what to do here?
bx lr |
Does anyone know where this came from, or how it should work? Does it return Radians or Degrees? Are the parameters FIXED 24:8 or something else?
I'd use the BIOS ArcTan2 function, except that must also use the Divide BIOS call, but since I'm using Peter-Teichmann's divide function, I planned to incorporate that into an ArcTan2 function, preferably this one. Any help would be appreciated. If this does work, is there a possibility that it could be edited so that instead of returning 0-359 degrees, it would return 0-511 degrees (512 degrees equaling 2π)? It would help to keep angles within limits by ANDing 0x01FF (511) to it. It is very simple with sin and cos because those are LUTs, but this isn't. thx for any help givin.
EDIT: I did find it here. Lupin posted it here. Does it really work Lupin? If so, how do you use it, as I am not very knowedgeable in ASM at the moment to see how it works. Could it be easily converted to a range of 0-511 degrees?
_________________
DS - It's all about DiscoStew
#24262 - ecurtz - Sat Jul 31, 2004 12:48 am
There is a much better atan2 function using cordic rotators posted in the forums here. You should implement that in assembly instead of trying to do one with divide (unless you need extreme precision.) My raycaster only uses 512 distinct angles for its tables and I was able to reduce the algorithm to 14 iterations and keep enough precision for my stuff. As a bonus you get the distance function nearly for free if you want it.
<edit> I'm not actually POSITIVE that the cordic rotator is faster, but it seems really likely - no divides. I'll post my "optimized" C version when I get home if anybody wants to compare it to BIOS or do an ASM version. </edit>
#24274 - DiscoStew - Sat Jul 31, 2004 8:36 am
I searched the forums for cordic rotators, and I only found this with a post by AnthC on cordic rotators. I went and copied that code into a Windows project to see if it worked, and the resulting number was some number far up in value. I even did a Google search and came up with this website which looks like it could give me a result in Polar Coordinates (Distance and Theta), but the theta from Arctan2 at point (10,10) brought up 188743680. Am I doing something wrong? Either look at it on the website, or I can post it here.
If it wouldn't be too much trouble, I'd like to see your method. Both the 512 angles and free Distance is exactly what my program could really use, the 512 angles so that a simple "&= 511" keeps the angles within limits, and my program get the Distance from the origin to point (X,Y) right after a call to the ArcTan2. I am currently using the BIOS call ArcTan2 to get an angle, but if your function is faster, I'd switch over in a second.
_________________
DS - It's all about DiscoStew
#24282 - ecurtz - Sat Jul 31, 2004 3:30 pm
Here is the page with the paper I read. Not necessary unless you care how it works, but it does have implementations of other functions. I guess I lied about having the GBA version on this machine, but hopefully this Java/pseudo-code with comments should be good enough. Otherwise let me know and I'll get the GBA version up ASAP.
Code: |
/* this is what you set to determine your output range */
/* currently is just 1 << 26 for scaled radians, but could be */
/* something like (256/PI) << 16 to get a 16:16 fixed "angle" */
/* with 512 "degrees" in a circle */
static int atan2Scale = (1 << 26);
static int atan2Table[];
static {
/* adjust the size based on accuracy required */
/* only using _8_ steps in current test, not 18 */
/* this should be calculated on the computer and stored with tables */
atan2Table = new int[18];
for (int i=0; i < 18; i++)
{
double tmp = Math.pow(2.0,(double)-i);
tmp = Math.atan(tmp);
atan2Table[i] = (int)(tmp * (double)atan2Scale);
}
}
static int atan2(int y,int x)
{
int tmp = 0;
int ang = 0;
int shift = 0;
/* 'normalise' for accuracy */
if (x < 0) {
x = -x; y = -y;
ang = (int)(Math.PI * atan2Scale); // obviously a pre-computed constant
}
tmp = x;
if ( y < 0) {
tmp |= -y;
}
else {
tmp|=y;
ang=-ang;
}
/* better to use a constant shift if you know your range */
while((tmp & 0xFE000000) == 0) {
tmp <<= 1; shift++;
}
x <<= shift;
y <<= shift;
/* 8 steps seems ok for 512 separate "angles" */
for (int i = 0; i < 8; i++) {
if (y < 0) {
ang -= atan2Table[i];
tmp = x - (y>>i);
y += (x>>i);
}
else {
ang += atan2Table[i];
tmp = x + (y>>i);
y -= (x>>i);
}
x=tmp;
}
/* distance is just x after the conversion, in whatever format x was in */
int distance = x >> shift;
/* angle can be negative, so adjust if needed */
if (ang < 0) {
ang += (int)(2.0 * Math.PI * atan2Scale); // obviously a pre-computed constant
}
/* remember this is scaled by whatever you are using in your table */
return ang;
}
|
Hope that helps, now I'm curious about how it compares.
#24310 - DiscoStew - Sun Aug 01, 2004 6:45 am
Well, I tried your ArcTan function in a Win32 console project just so that I could get it working. I must say that it is pretty good, but it doesn't seem to be very accurate, extremely bad with distance. I've set the atan2Scale equal to (256 / PI) << 16 so that I could get 2PI = 512. When using point (0, -10), I got 377 which was surprisingly close, but when I tried (0, 10), I got 134, which was 6 degrees off. Increasing the steps helped, but only a little. Distance, however, was way off, at least in my tests. At point (0, 10), I got a result of 164, and point (-1000, 80) gave me 2720, not 1003. Ouch! Have you been getting the same results? Even when working with radians and 2PI = 360 has the same kind of estimation. I'd like to see the GBA version if it isn't too much trouble. thx
_________________
DS - It's all about DiscoStew
#24327 - ecurtz - Sun Aug 01, 2004 5:14 pm
I'm out of town visiting my parents, I'll post the code Moday evening. I admit I haven't tested the distance calculation, but I'm surprised it is so bad.
Try using (int)((1.0 / PI)*(1<<26)) for you table scaling and atan2(y,x)>>18 to get your angles. I'm not seeing anything off by more than one "degree"(512) with these values. Maybe there is something wrong with my testing methodology, I haven't used this in an actual game. Is it possible your table was getting calculated with integer math before the final conversion?
#24354 - DiscoStew - Mon Aug 02, 2004 3:44 am
Yes, that change to atan2Scale seemed to do it. Now I can only see about a 1.5 degree max error, but increasing the step even by 1 or 2 fixes it to be more accurate. Thanks a lot. Now, with this change, I'm beginning to see a pattern with what the Distance should be, and what the Distance becomes from the function, as shown here...
Point (1, 0) ; Actual Distance = 1 ; Function Distance = 1
Point (10, 0) ; Actual Distance = 10 ; Function Distance = 16
Point (100, 0) ; Actual Distance = 100 ; Function Distance = 164
Point (1000, 0) ; Actual Distance = 1000 ; Function Distance = 1646
From what I can see (and calculated), dividing the Function Distance by 1.646702791 (or multiply by 0.6072529195657680112885888735598) should give an approximate Distance. However, at points that are very close to the origin, like (1,0), will result in 0.
In terms of the GBA, an approximation would be something like this...
ActualDistance = (FuncDistance * 2487) >> 12;
Unless someone can fix the distance problem, I guess I'll be doing this this way. I'm go and convert this to the GBA so that I can compare it to the BIOS function. I don't know ASM, so I can't convert it to that.
_________________
DS - It's all about DiscoStew
#24380 - DiscoStew - Mon Aug 02, 2004 4:56 pm
Well, I have just done my comparison with the BIOS ArcTan2 and Cordic Rotators. Now I can't really say which one is the better because it all depends on what a programmer needs, just an ArcTan2 function, or a function that calculates the ArcTan and Distance at the same time.
For the Cordic Rotator test, the function is in C, and placed in IWRAM as ARM code, with its scale and table in IWRAM also, with PI = 256 deg, not 180 deg. For the BIOS test, a call to swi 0x0a0000, plus a simple formula to get an approximate angle between 0 and 360 (result = ((BIOS_Result * 727) >> 17);). I had tried to incorporate this simple formula into assembly, but it kept saying that there were bad arguments for the instruction 'mul', so this is running under THUMB right now. Also with the BIOS ArcTan is Distance calculated with an IWRAM ASM Sqrt function (by KevinW).
For calculating the test, I used the emulator VBA (of course), and used a counter which increments thoughout post and pre VBlank stages, so a test within 223 scanlines for each. Calculations resulting in angles divisible by PI/2 don't count because a pre-check could be done before actual processing. Results were written to an unused area in IWRAM, then retrieved through Memory viewer. The result is how many times the function was called, so higher means better.
Code: |
Point from Origin | CordicRot | BIOS | BIOS w/ Distance
(1, 1) | 843 | 949 | 743
(-76, 330) | 973 | 976 | 763
(345, -10432) | 928 | 1007 | 779
(-6346, -95324) | 928 | 999 | 773 |
As you can see, the BIOS ArcTan2 call is faster in practically all cases than the Cordic Rotators function, but when combined with an ASM Distance function to equal what the Cordic Rotators function can do, Cordic goes all the way. However, remember that the Cordic function was written in C. I could see the function go way faster if it were coded in ASM.
_________________
DS - It's all about DiscoStew
#24381 - ecurtz - Mon Aug 02, 2004 5:12 pm
Great, thanks for doing that testing. Of course I don't know if VBA is cycle accurate, does anyone know if I could do this with the shareware version of no$gba? (still waiting for Martin to get PayPal going, though.)
How many iterations were you using in the rotator?
I'm going to try to do an assembly implementation this week as a learning exercise. I'm post my progress here as I'm going if anybody wants to kibitz. Given how close the C version is to the BIOS I'd be VERY surprised if we can't beat it in all cases with a little head scratching. I need to learn ARM assembly at some point anyway so I can hang out here with the cool kids...
#24401 - DiscoStew - Tue Aug 03, 2004 2:36 am
VBA is not cycle accurate, so I would say that the results would be lower. I'd go and try it with the free version of no$gba, but that would require something like setting the screen to display the count, but I was only trying something quick and easy to check it faster. Later, I might try that.
Iterations? You mean steps? I'm using 8 like you had. Any number below that would result in a lot less accuracy.
ecurtz wrote: |
I need to learn ARM assembly at some point anyway so I can hang out here with the cool kids |
Me too. =) Starting Aug 16.
_________________
DS - It's all about DiscoStew
#24406 - torne - Tue Aug 03, 2004 3:04 am
Really should finish my static timing analyser at some point; it got spun off from the work I was doing on high-level assembly and never finished. For a cacheless system like the GBA it should be able to give you cycle-accurate timings without having to emulate anything at all, but right now it's too primitive to use on any real code examples.
#24552 - ecurtz - Thu Aug 05, 2004 4:47 pm
Well, I'm still working on the assembly, but here are a couple (should have been obvious) improvements for the C version I figured out while working on it.
Code: |
while((tmp & 0xFE000000) == 0) {
tmp <<= 1; shift++;
} |
should be replaced with something like
Code: |
if (tmp < 0x01000000)
shift += 3;
if (tmp < 0x00200000)
shift += 3;
if (tmp < 0x00040000)
shift += 3;
|
and the final angle should be
Code: |
(ang + (1<<16)) >> 18;
|
rather than just
Of course this was just typed in, but the ideas are correct, even if there are typos.
#24557 - DiscoStew - Thu Aug 05, 2004 7:28 pm
I went to take your optimizations into the test that I did earlier, but I then remembered that I began using that project for something else, so I went and recoded the test just as it was before these new optimizations. For some reason, I am getting different results that from earlier, and yet the actual angle and distance are correct (except for the last test, in which only the distance was not correct, it has been changed). Here is the redone test from earlier with the different results...
Code: |
Test done in a 223 scanline period (more means better)
Point from Origin | CordicRot | BIOS | BIOS w/ Distance
(1, 1) | 1199 | 1010 | 810
(-76, 330) | 1299 | 1031 | 802
(345, -10432) | 1307 | 1026 | 788
(-6346, -35324) | 1345 | 983 | 772 <-Y was changed |
If you are wondering about the test, I am too, as now the Cordic Rotator function now is faster, even though it is the exact same code used earlier. Even the BIOS function became faster, though now it isn't faster that CordicRot. Was my computer doing something else while these tests were running?
Now you must be waiting for the new test involving the changes. Right now I just want to say...
"ECURTZ!!!!! YOUR A GENIUS!!!!!!"
Why, well here is the results...
Code: |
Test done in a 223 scanline period (more means better)
Point from Origin | CordicRot
(1, 1) | 1733
(-76, 330) | 1671
(345, -10432) | 1545
(-6346, -35324) | 1545 <-Y was changed |
The outcome is incredible, and yet it is still in C, running as ARM code in IWRAM. Now I'd really like to see it ASM, how about you?
EDIT:
I think I know the reason why in my redone test the CordicRot function became faster. The atantable was converted into a LUT without needing to create it in real-time, and all computation that were meant to be pre-computated were changed to to their actual values. Plus I converted the function a little so now you can choose whether to return the Distance or the ArcTan, and if you also need the other, the values are stored in their own personal variable, which I used to make inline functions for both. The BIOS function, however, is a mystery to me as to why it is somewhat faster than before. I'll post the new function if you'd like. Remember, I was still using VBA, so it isn't cycle-accurate.
_________________
DS - It's all about DiscoStew
#24598 - ecurtz - Sat Aug 07, 2004 6:11 am
Hardly a genius, more of a "... great coders steal" situation, but thanks.
Now here's the payoff - I'm posting my first attempt at an assembly version into a new thread. Hopefully some of the experts will look it over and suggest any fixes. It is a little larger than I'd like, but it should be pretty fast.