#63323 - ghost Leinad - Mon Dec 12, 2005 12:18 am
I was trying to make a routine for dividing with values other than powers of 2...I couldn't
then I found this:
"Dividing with values other than powers of 2 is trickier and is often not possible"
then how can I divide if I can't use other than powers of 2??? :S:S:S:S
I was thinking in using multiplications, but I can't use floating point either :S
ah, and the "module"??? (I don't know if this is the name, I'm guessin' it :P. You know the number left by the division - It's symbol is %) how do I handle that??
Tanx in advance
_________________
All human wisdom is summed up in these two words, - 'Wait and hope"
****************************************
My site www.myth-world.net and www.bmrpg.com :)
#63326 - poslundc - Mon Dec 12, 2005 12:45 am
ghost Leinad wrote: |
I was trying to make a routine for dividing with values other than powers of 2...I couldn't
then I found this:
"Dividing with values other than powers of 2 is trickier and is often not possible"
then how can I divide if I can't use other than powers of 2??? :S:S:S:S |
You can divide by using C's built-in divide operator (/). Just be aware that it will be dirt, dirt slow if either:
1. You're not dividing by a constant value (in other words, dividing by a variable), or
2. You have optimizations turned off.
There are numerous ways to optimize division - most notably, that you can perform simple bitshifts for denominators that are even powers of two - but there's nothing that says you can't do division at all; you just want to avoid it wherever possible because it is so time-consuming.
Quote: |
I was thinking in using multiplications, but I can't use floating point either :S |
So use fixed-point instead. A reciprocal lookup table is one of the better optimizations for division.
Quote: |
ah, and the "module"??? (I don't know if this is the name, I'm guessin' it :P. You know the number left by the division - It's symbol is %) how do I handle that?? |
The remainder of a division, or modulus operator also works fine, but is much trickier to optimize. Again, even powers of two are dead simple: just take the bitwise-AND with the value in question minus one, eg. (x % 32) == (x & 31) for all positive values of x (negative values behave slightly differently). Most other optimizations on modulus are concentrated on finding ways of optimizing division, and then using that to get the remainder. I'm not aware of any optimization specific to modulus without division, though.
Dan.
#63329 - ghost Leinad - Mon Dec 12, 2005 1:17 am
oh yes, modulus :P
yes it is a variable...
well...look, what I'm trying to do is a score system with four digits, but with just one variable for the four.
so, I need to obtain the single value per digit. For instance to get the units I make number%10.
In fact the game is very simple, for now I'm using the / and % operator without problems, I just want to improve the divisions.
Is there another way to do this? If it is let me know :D
_________________
All human wisdom is summed up in these two words, - 'Wait and hope"
****************************************
My site www.myth-world.net and www.bmrpg.com :)
#63335 - DekuTree64 - Mon Dec 12, 2005 2:02 am
Dividing by 10 for a score will be optimized to a reciprocal multiply by the compiler, so it will only take a few cycles. The slow thing is when you have variableOrConstant / variable.
And anyway, even 4 real divides per frame is nothing to worry about. The only time I've really needed a reciprocal table was for texture mapped 3D rendering, which involves several divides per triangle, times a few hundred triangles.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku
#63364 - Cearn - Mon Dec 12, 2005 12:18 pm
DekuTree64 wrote: |
Dividing by 10 for a score will be optimized to a reciprocal multiply by the compiler ... |
I keep hearing this, but what is this based on? If I try "y=x/10;", where x and y are globals so they won't be optimised out as constants, I still get a call to __divsi3, even at -O3 optimisation. Furthermore, it can't be optimized to a reciprocal division unless you know the range of the (variable) numerator to know what the safety limits are. Constant/constant can be done at compile-time, but as soon as you get to variables things will get murky.
ghost Leinad wrote: |
In fact the game is very simple, for now I'm using the / and % operator without problems, I just want to improve the divisions. |
If it's not causing problems, then maybe it's not really worth optimising for, see poslundc's posprintf page for details. Or just use that function to parse the number and be done with it.
#63391 - Miked0801 - Mon Dec 12, 2005 6:06 pm
I don't believe that a /10 will be compiler optimized. You'd have to do something like:
const s32 recip = 65536 / 10; // = 6554
y = (x * recip) >> 16;
Which is a hell of a lot faster than divisi3. But again, for score keeping, this is trivial overhead.
#63393 - kusma - Mon Dec 12, 2005 6:22 pm
a division by a constant can allways be performed with a 64-bit multiply (or multiply-add if you want proper rounding). the same goes for a modulo, except there's a bit more trickery involved. gcc with a decent optimization-level produce this code for you.
#63394 - poslundc - Mon Dec 12, 2005 6:28 pm
I wrote the following test function...
Code: |
int fn(int x)
{
return x / 10;
} |
... and compiled it against GCC 2.9 (I know; I'm a dinosaur) with level 1 optimizations enabled, and it produced the following ARM code:
Code: |
@ Generated by gcc 2.9-arm-000512 for ARM/elf
.gcc2_compiled.:
.text
.align 0
.global fn
.type fn,function
fn:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, current_function_anonymous_args = 0
mov ip, sp
stmfd sp!, {fp, ip, lr, pc}
sub fp, ip, #4
ldr r3, .L3
smull r2, r3, r0, r3
mov r0, r0, asr #31
rsb r0, r0, r3, asr #2
ldmea fp, {fp, sp, pc}
.L4:
.align 0
.L3:
.word 1717986919
.Lfe1:
.size fn,.Lfe1-fn |
So it is clearly doing the peephole-optimization for constant division in this case. When compiling as Thumb code, however, it generated the following even with level 3 optimizations enabled:
Code: |
@ Generated by gcc 2.9-arm-000512 for Thumb/elf
.code 16
.gcc2_compiled.:
.text
.align 2
.globl fn
.type fn,function
.thumb_func
fn:
push {lr}
mov r1, #10
bl __divsi3
pop {pc}
.Lfe1:
.size fn,.Lfe1-fn |
So it seems clear that when compiling for Thumb - which doesn't have access to the MULL family of instructions - reciprocal division for constant values is not considered a valid optimization; only when compiling ARM-targetting code.
Since the vast majority of GBA code is compiled for Thumb, though, this means I was incorrect in assuming that the peephole optimization would be performed all (or even most) of the time.
Dan.