gbadev.org forum archive

I have a set of functions that need to be optimized for speed and platform independence.

For Multiplication (by 10) I use

Code:

x = (y << 3) + (y << 1)

is there a similar way to do this for division (by10)?

Last edited by Joe_Sextus on Wed Nov 29, 2006 2:44 am; edited 1 time in total

short post:
Multiplying by ten is faster, don't bother with shifting and adding.

long post:
So we're trying to compare these two operations:

Code:

add r0,r1,r1
add r0,r0,r1,lsl#3

mov r0,#10
mul r0,r1,r0

Both of these take 2S+1I cycles to complete.
In THUMB mode, the mov and multiply wins, since the shifting and adding version would need more instructions.

The ARM7 already has a multiply instruction, so you don't need to replace it with shifting and adding.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

I'm not as concerned about the multiplication, as I am with division.

What is the fastest way to do division and keep the platform independence? Is this the best way with gcc?

Code:

y = x / 10

You can do reciprocal multiplication, like x*6554>>16. The 6554 there is (1<<16)/10 (which is 6553.6), rounded up. Be careful of overflowing 32 bits with the intermediate value though. Also, you'll start getting accuracy problems if x is too large. You could do a 64-bit multiply with (1<<32)/10 if necessary to deal with either of those.
I've heard before that GCC will automatically do reciprocal multiplies instead of real division when you divide by a constant, but I've never verified it personally.

And for the record, shift-add multiplies do have their place. For one thing, you can often squeeze in a free add with a non-shifted value, while mla takes an extra cycle to do it. More importantly though, you don't need a temporary register to store the value you're multiplying by. This can be a life saver in a tight inner loop if you'd have to swap things around on the stack otherwise.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Joe_Sextus wrote:

I'm not as concerned about the multiplication, as I am with division.

What is the fastest way to do division and keep the platform independence? Is this the best way with gcc?

Code:

y = x / 10

gcc with optimizations turned on implement divisions by constants as long multiplies, so there's no real point in worrying about this.

kusma wrote:

gcc with optimizations turned on implement divisions by constants as long multiplies, so there's no real point in worrying about this.

In ARM mode it does the optimization... in Thumb mode, it does manual division :(

If you want to know how to combine shifts cumulatively to divide, this page goes into some detail (it's what I based my posprintf on). Suffice to say that it's far more mathematically complicated than multiplication, and you are usually far better off multiplying by a constant reciprocal when possible.

Dan.

Thanks for the replies I think in this case the reciprocal approach may be best. But I discovered two case that I am not sure of how to handle.

Code:

a = b / c
d = b % c

Is this actually the best way to do this for platform independence and speed or is there a better way?

if you wanna know it in arm try
http://www.ee.ic.ac.uk/pcheung/teaching/ee2_computing/arm/Progtech.pdf
page 70

Modulus (the % operator) can be implemented as a multiply and divide as so:

d = b % c

d = b - (b / c) * c

So once you have your result a = b / c, you can easily compute modulus with an additional multiply and subtraction.

gladius wrote:

Modulus (the % operator) can be implemented as a multiply and divide as so:

d = b % c

d = b - (b / c) * c

So once you have your result a = b / c, you can easily compute modulus with an additional multiply and subtraction.

I feel dumb for not thinking of that.

Many division routines actually have the modulus available to them upon termination - but for want of a place to return the value. Nothing like getting a divide and having the mod for free.

Miked0801 wrote:

Many division routines actually have the modulus available to them upon termination...

I would have used the swi because of this, but I need the code to be independent of the GBA / DS.

Joe_Sextus wrote:

Miked0801 wrote:

Many division routines actually have the modulus available to them upon termination...

I would have used the swi because of this, but I need the code to be independent of the GBA / DS.

Use a wrapper function?

-Brendan

The version of the divide I'm thinking of is a stand alone, ARM asm version. Basically, repeat 3 instructions per bit of the divide. When done you have the quotient and remainder both available for return. Jeff Frohein<sp> has a version of this divide on his site somewhere.

div10
; takes argument in a1
; returns quotient in a1, remainder in a2
; cycles could be saved if only divide or remainder is required
SUB a2, a1, #10 ; keep (x-10) for later
SUB a1, a1, a1, lsr #2
ADD a1, a1, a1, lsr #4
ADD a1, a1, a1, lsr #8
ADD a1, a1, a1, lsr #16
MOV a1, a1, lsr #3
ADD a3, a1, a1, asl #2
SUBS a2, a2, a3, asl #1 ; calc (x-10) - (x/10)*10
ADDPL a1, a1, #1 ; fix-up quotient
ADDMI a2, a2, #10 ; fix-up remainder
MOV pc, lr

Alternatively, compile the following as ARM code:

Code:

#include <stdlib.h>

div_t div10(int x)
{
div_t y;
y.quot= x/10;
y.rem= x%10;

return y;
}

In ARM the division and modulo will be turned into multiplications and subtractions, but it has the benefit of being portable and working for both positive and negative x). It does take a little longer than Ant6n's version because there's some stack-work involved.

I realized since my last post that both of my targets use versions of the ARM instruction set and that I will need division by a variable as well. So I googled for a division routine and found http://www.tofla.iconbar.com/tofla/arm/arm02/index.htm
My complete function (doesn't do signed division).

Code:

.align
.arm
.global div

@void div(unsigned int retvalue, unsigned int dividend, unsigned int divisor, unsigned int modulus)

div:
cmp r2, #0 @if(divisor == 0)
beq .div_end @ return;

mov r0, #0 @retvalue = 0;
mov r3, #1 @placeholder = 1;

.start: @while(divisor < dividend)
cmp r2, r1 @{
movls r2, r2, lsl#1 @ divisor <<= 1;
movls r3, r3, lsl#1 @ placeholder <<= 1;
bls .start @}

.subtract @while(r1 - r2 > 0)
cmp r1, r2 @{
subcs r1, r1, r2 @ dividend -= divisor;
addcs r0, r0, r3 @ retvalue += placeholder;
movs r3, r3, lsr#1 @ placeholder >>= 1;
movs r2, r2, lsr#1 @ divisor >>= 1;
bcc .subtract @}

.div_end:
bx lr @return;

Currently this code doesn't return the modulus, but it will when I finish it. (That's a simple mov r3, r1).
[EDIT]
Fixed typo.

To add signed, just xor the starting 2 numbers and look at bit 31. If set, the result is negative, else is positive. Then ABS and go.

Another speed trick is to completely unroll the divide loop and jump into the code based on the number of signifigant bits. Man I wish I had the routine handy - I'd post it. ARM also has a routine they publish with their stuff that has a nice divider.

CLZ in about 10 cycles - http://forum.gbadev.org/viewtopic.php?t=4359

I was thinking something similar to this for signed division

Code:

void sdiv(signed int retvalue, signed int dividend, signed int divisor, signed int modulus)
{
bool negative = false;
if(dividend & 0x80000000)
{
negative = 1;
dividend *= -1;
}

if(divisor & 0x80000000)
{
negative = !negative;
divisor *= -1;
}

div(retvalue, dividend, divisor, modulus);

if(negative)
retvalue *= -1;
}

Is there an easy way to change sign in ASM?

Joe_Sextus wrote:

Is there an easy way to change sign in ASM?

Code:

int changesign(int a) {
return 0-a;
}

int changesign(int a){
return (~a)+1;
}

keldon wrote:

Joe_Sextus wrote:

Is there an easy way to change sign in ASM?

Code:

int changesign(int a) {
return 0-a;
}

int changesign(int a){
return (~a)+1;
}

Are either of those more efficient than

Code:

int changesign(int a)
{
return a * -1;
}

Quickest in ARM:

Code:

rsb r0, r0, #0

Quickest in THUMB:

Code:

neg r0, r0

Multiplying is much slower and needs a temporary register:

Code:

mov r1, #-1
mul r0, r1, r0

_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

The compiler is smart enough in every case I've come across to change:

x = x * -1

to

x = -x;

Which involves only 1 instruction.

And change sign even simpler is

int changeSign(int x)
{
return -x;
}

This does the 2s compliment for you.

Miked0801 wrote:

The compiler is smart enough in every case I've come across to change:

x = x * -1

to

x = -x;

Which involves only 1 instruction.

And change sign even simpler is

int changeSign(int x)
{
return -x;
}

This does the 2s compliment for you.

That will probably be the way I will do it.

If I want to passed return values into the function arguments, do I have to handle pointers in ASM, or does the C register keyword allow me to do that?

Code:

void div(register unsigned int retvalue, register unsigned int dividend, register unsigned int divisor, register unsigned int modulus)

The arm calling convention by default puts the first 4 arguments that are 32 bits or less into r0-r3, so that scenario will "just work".

Also, in newer compilers the register keyword is completely ignored.

BTW, here:
http://www.peter-teichmann.de/adiv1e.html

in this regard it probably would be good if somebody mentions static inline functions, i dont remember how they work for sure though. I think its possible to just define a function like that in a header file
static inline int changeSign(int x)
{
return -x;
}
are there any compiler flags needed now??

Miked0801 wrote:

BTW, here:
http://www.peter-teichmann.de/adiv1e.html

I have already looked at that. When I first looked at it I didn't understand it, until I realized that it is the same basic thing as mine only with unrolled loops.

Ant6n wrote:

in this regard it probably would be good if somebody mentions static inline functions, i dont remember how they work for sure though. I think its possible to just define a function like that in a header file
static inline int changeSign(int x)
{
return -x;
}
are there any compiler flags needed now??

Why not just do this then

Code:

#define changeSign(x) (-x)

Joe_Sextus wrote:

Why not just do this then

Code:

#define changeSign(x) (-x)

I could riff on the advantages of type-checking and debugging and whatnot, but instead I'll just pounce on the example you gave to illustrate the dangers of reckless macro use.

What do you think will be the result if you use your changeSign macro with the following?

Code:

int a = 4;
int b = 5;
int c = changeSign(a + b);

Dan.

Change your macro to:

Code:

#define changeSign(x) (-(x))

To keep from messing up like poslundc is pointing out.

Quote:

I have already looked at that. When I first looked at it I didn't understand it, until I realized that it is the same basic thing as mine only with unrolled loops.

It takes it a step beyond actually - it does a binary search based on the MSB to prevent a whole bunch of wasted time. It's faster on smaller divides, but a touch slower on larger number divides. I found it to be very clever.

gbadev.org forum archive

C/C++ > Another Division Question

#110532 - Joe_Sextus - Wed Nov 29, 2006 1:10 am

#110541 - Dwedit - Wed Nov 29, 2006 2:28 am

#110543 - Joe_Sextus - Wed Nov 29, 2006 2:54 am

#110547 - DekuTree64 - Wed Nov 29, 2006 3:25 am

#110581 - kusma - Wed Nov 29, 2006 3:36 pm

#110591 - poslundc - Wed Nov 29, 2006 5:42 pm

#110658 - Joe_Sextus - Thu Nov 30, 2006 4:14 am

#110659 - Ant6n - Thu Nov 30, 2006 4:23 am

#110673 - gladius - Thu Nov 30, 2006 9:35 am

#110686 - Joe_Sextus - Thu Nov 30, 2006 2:07 pm

#110739 - Miked0801 - Fri Dec 01, 2006 2:40 am

#110742 - Joe_Sextus - Fri Dec 01, 2006 2:51 am

#110743 - sgeos - Fri Dec 01, 2006 3:54 am

#110839 - Miked0801 - Sat Dec 02, 2006 2:58 am

#110855 - Ant6n - Sat Dec 02, 2006 7:01 am

#110871 - Cearn - Sat Dec 02, 2006 1:30 pm

#110881 - Joe_Sextus - Sat Dec 02, 2006 4:18 pm

#111084 - Miked0801 - Mon Dec 04, 2006 1:04 am

#111092 - keldon - Mon Dec 04, 2006 1:41 am

#111093 - Joe_Sextus - Mon Dec 04, 2006 1:45 am

#111096 - keldon - Mon Dec 04, 2006 2:40 am

#111101 - Joe_Sextus - Mon Dec 04, 2006 2:56 am

#111103 - DekuTree64 - Mon Dec 04, 2006 3:12 am

#111195 - Miked0801 - Mon Dec 04, 2006 8:12 pm

#111201 - Joe_Sextus - Mon Dec 04, 2006 8:51 pm

#111210 - gladius - Mon Dec 04, 2006 10:14 pm

#111238 - Miked0801 - Tue Dec 05, 2006 3:43 am

#111241 - Ant6n - Tue Dec 05, 2006 4:17 am

#111248 - Joe_Sextus - Tue Dec 05, 2006 5:29 am

#111252 - poslundc - Tue Dec 05, 2006 8:12 am

#111293 - gmiller - Tue Dec 05, 2006 3:01 pm

#111352 - Miked0801 - Tue Dec 05, 2006 7:47 pm