gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > DIVIDE: gcc '/' operator vs. swi 0x60000

#28455 - pyros - Sun Oct 31, 2004 10:02 pm

It is frequently quoted that it is much faster to use swi 0x60000 than the '/' operator for division as the bios routine is far more efficient than the code that gcc produces.

However, all the examples i have seen put the bios call in a function which is then called to perform the division. The most concise i have seen is (tonc):

s32 SWIDiv(s32 num, s32 denom) { asm("swi 0x60000"); }

and division is performed:

quotient = SWIDiv(numerator,denominator);

Surely calling this function means that all the current registers (and possibly more things) are pushed onto the stack causing what could well be more overhead than a simple '/' ?

So: which is faster in practice?

#28459 - poslundc - Sun Oct 31, 2004 11:08 pm

pyros wrote:
Surely calling this function means that all the current registers (and possibly more things) are pushed onto the stack causing what could well be more overhead than a simple '/' ?


Unfortunately, there's nothing simple about the '/'. Because the ARM7 CPU doesn't have a built-in divide operation, division needs to performed manually using an algorithm that GCC provides, which is very, very ugly.

The BIOS routine is the fastest alternative if you don't want to do any "real" work (ie. coding your own faster routine, or using a reciprocal lookup table). The overhead of calling the function is relatively negligible compared to the actual division algorithm; there are ways to speed up the entry/exit a bit but nothing I would say that's worth doing unless you find it's slowing down your code, at which point you should replace it with a faster custom algorithm anyway.

Bear in mind that code that looks more compact in C doesn't necessarily mean the code it generates will be either shorter or faster.

Another thing to keep in mind is that while GCC's general division is mega-ugly, the compiler is smart enough to do some basic optimizations for you. For example, if it sees you dividing by a constant, even power of two it will do a bitshift instead, or if you are dividing by a constant (but not an even power of two) it will do reciprocal multiplication, which is much, much faster than any division algorithm.

So don't necessarily shy away from using '/', so long as you are aware of what's happening when you use it.

Dan.

#28460 - DiscoStew - Sun Oct 31, 2004 11:08 pm

The short version to answer your question:

The BIOS Divide is faster.

The long, and somewhat off-topic, version to answer your question:

Probably the best way to do division, in a sense, is to not do it. However, there will be those times where you really do need them. The FAQ under the Beginner's group shows the order as to what is faster. One thing I would change in that FAQ is that creating an ASM divide function would be faster than the BIOS divide. There is one floating around here; made by a person named Peter Teichman (last name not correct) that is capable of doing divisions in IWRAM as fast as 51 cycles, whereas the BIOS takes hundreds more. Find it, and use it.
_________________
DS - It's all about DiscoStew

#28463 - sajiimori - Sun Oct 31, 2004 11:36 pm

As opposed to an odd power of two? =P

#28464 - pyros - Sun Oct 31, 2004 11:39 pm

Thankyou both for your help :)

I was mostly concerned about the function overhead as i had read in a c optimising tutorial that calling a function can require a lot of overhead (with stack pushes).

I'm dividing unknown numbers (0 to 65536) by other unknown numbers (256 to 512), so i guess that bios will be a lot faster than '/' even with gcc optimising. I shall certainly look up the as-fast-as-51cycle function though :) I'm guessing it could (maybe) be possible to optimise a routine if i'm not using the full 32bits.

Thankyou :)

#28467 - poslundc - Sun Oct 31, 2004 11:53 pm

sajiimori wrote:
As opposed to an odd power of two? =P


As opposed to an uneven power of two, smarty. ;)

Dan.

#28471 - sajiimori - Mon Nov 01, 2004 12:36 am

Now I'm curious. What's an uneven power of two, and why would it prevent shift-divides?

#28472 - poslundc - Mon Nov 01, 2004 12:50 am

A power of two that is not a whole number.

Dan.

#28474 - sajiimori - Mon Nov 01, 2004 1:13 am

Ah, I'm not used to thinking of exponents as fractions.

#28475 - DekuTree64 - Mon Nov 01, 2004 1:16 am

pyros wrote:
I'm dividing unknown numbers (0 to 65536) by other unknown numbers (256 to 512), so i guess that bios will be a lot faster than '/' even with gcc optimising. I shall certainly look up the as-fast-as-51cycle function though :) I'm guessing it could (maybe) be possible to optimise a routine if i'm not using the full 32bits.


Yes, you certainly can optimize it for less bits, but for dividing by such a small range as 256-512, I would definitely recommend a reciprocal table. And only dividing u16's makes it even easier, because you only need 16-bit reciprocals to get perfect accuracy. That means your table can be u16's to save memory, and you don't need to use a long multiply instruction (which involves a function call to switch to ARM mode).
Create your table with
Code:
for(x = 256; x < 512; x++)
   table[x-256] = 65536/x;

and to use it, do
Code:
result = numer * table[denominator-256] >> 16;

_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#28477 - isildur - Mon Nov 01, 2004 1:22 am

On my site, there is a divide routine coded in assembly. It is from Dooby.
Here is the link:

http://www.geocities.com/v_d_d/gba/math.html

If someone has faster code for this (a general divide routine), please post it ;)

#28499 - Miked0801 - Mon Nov 01, 2004 7:47 pm

[url]
http://www.peter-teichmann.de/adiv1e.html
[/url]

This will be 10-30% faster, but take more IWRAM

#28509 - isildur - Mon Nov 01, 2004 10:14 pm

Thanks Mike, I will try it. :-)