gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > fast Abs() in ASM

#7928 - Lupin - Sat Jun 28, 2003 7:37 pm

At the moment I was just using an simple cmp instruction and used rsblt to negate the value, but I figured out that an OR/ADD-Operation would also do what I want, this would get the absolute value of an s32 I think

(NUM | 0x80000000) + 0x7FFFFFFF

Wich way would you use/would take less cycles?

#7935 - DekuTree64 - Sat Jun 28, 2003 10:31 pm

I'd use the cmp/rsblt. cmp and rsb take one cycle each, so you can't really beat that, and 0x7fffffff can't be created from an 8-bit value+shift, so you'd have to use a mvn r, #0x8000000, which would make it a total of 3 cycles.

#7954 - Lupin - Sun Jun 29, 2003 12:45 pm

thanks! Hm, I figured out that the methode above would still need an cmp instruction though :)

#8027 - beelzebub - Mon Jun 30, 2003 11:39 pm

another way of doing abs() without an cmp/branch is as follows...

r1 = r0 >> 31
r0 ^= r1
r0 -= r1

#8071 - Lupin - Tue Jul 01, 2003 4:15 pm

but this seems to be slow as hell :(

#8075 - DekuTree64 - Tue Jul 01, 2003 5:31 pm

Actually it should be the same speed. Just use
Code:

eor r1, r0, r0, ASR #31
sub r0, r1, r0, ASR #31

I don't think it's possible to do in one instruction, so take your pick^^ Beelzebub's may be faster in THUMB though.
Code:

asr r1, r0, #31
eor r0, r1
sub r0, r1
bx lr

as opposed to
Code:

cmp r0, #0
bgt positive
neg r0, r0
positive:
bx lr

which is also 3 instructions, but that branch will take a little extra time if the number was already positive

#8485 - Archeious - Fri Jul 11, 2003 9:47 pm

Couldn't you just or off the signed bit.

Value = Value and 0x80000000

or am I missingin something big here.

Archeious

#8487 - tepples - Fri Jul 11, 2003 10:44 pm

Archeious wrote:
Couldn't you just or off the signed bit.

Value = Value and 0x80000000

No.

You probably think that the system works like this: 0x00000000 = 0, 0x00000001 = 1, 0x00000040 = 64, 0x80000001 = -1, 0x80000040 = -64, etc. That's the "sign bit" representation, which no modern architecture uses for integers.

The most common representation of integers is two's complement. A negative number is represented as 0x1 0000 0000 minus the number. Thus, 0xffffffff = -1, and 0xffffffc0 = -64.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#8494 - Archeious - Sat Jul 12, 2003 12:57 am

tepples wrote:
Archeious wrote:
Couldn't you just or off the signed bit.

Value = Value and 0x80000000

No.

You probably think that the system works like this: 0x00000000 = 0, 0x00000001 = 1, 0x00000040 = 64, 0x80000001 = -1, 0x80000040 = -64, etc. That's the "sign bit" representation, which no modern architecture uses for integers.

The most common representation of integers is two's complement. A negative number is represented as 0x1 0000 0000 minus the number. Thus, 0xffffffff = -1, and 0xffffffc0 = -64.


I guess thats what happens when you never get under the hood.

#8749 - Maddox - Sat Jul 19, 2003 2:15 am

It is.
_________________
You probably suck. I hope you're is not a game programmer.

#8752 - sgstair - Sat Jul 19, 2003 3:39 am

Here's another way to do an abs in ARM, which has the virtue of not using an exta register.

Code:

mov   r0,r0
rsbmi r0,r0,#0


for THUMB the code earlier quoted is probably the best way:
Code:

asr r1, r0, #31
eor r0, r1
sub r0, r1


The ARM implementation will take 2 cycles to execute from iwram regardless of the value of r0, the thumb version will take 3 cycles from iwram.

Assuming however, that we are running from the cartrige with 3 nonsequential wait states and 1 sequential, it will take 8 cycles for the ARM version, and 6 cycles for the thumb version (if the cart is set to default waitstate of 4-2, it will be 12 for arm and 9 for thumb)

-Stephen

#8760 - Dev - Sat Jul 19, 2003 6:03 am

A few corrections to the ARM side of things:

You need to use MOVS, not MOV, otherwise the status bits won't be set, and the RSBMI won't execute properly.

Obviously, if the status bits were set by a previous instruction that operated on R0, you don't even need the MOVS.

Also, it'll be 10 cycles from ROM at 3/1 (8 cycles for THUMB), but could be lower if prefetched.

Dev.

#8770 - sgstair - Sat Jul 19, 2003 1:46 pm

Dev wrote:
A few corrections to the ARM side of things:

You need to use MOVS, not MOV, otherwise the status bits won't be set, and the RSBMI won't execute properly.

Obviously, if the status bits were set by a previous instruction that operated on R0, you don't even need the MOVS.

Also, it'll be 10 cycles from ROM at 3/1 (8 cycles for THUMB), but could be lower if prefetched.

Dev.


yeah, sorry :)

-Stephen

#8776 - Lupin - Sat Jul 19, 2003 4:09 pm

wouldn't an cmp or tst 0x8000 be faster then movs r0, r0?

#8780 - sgstair - Sat Jul 19, 2003 6:05 pm

Lupin wrote:
wouldn't an cmp or tst 0x8000 be faster then movs r0, r0?


How so? they both take 1S cycles

-Stephen