#3576 - regularkid - Fri Feb 28, 2003 6:19 am
How would I go about doing division in ASM? I could be wrong, but it seems that there is no 'div' command. Any ideas?
_________________
- RegularKid
#3579 - TurboHz - Fri Feb 28, 2003 9:14 am
You can divide either using logical shifts (only when you're dividing by 2,4,8,16...) or using the 0x06: Div BIOS call (check the CowBite Specs doc)
#3600 - sloat - Fri Feb 28, 2003 10:04 pm
you can also divide by a constant using umull and multiplying by the reciprocal
Code: |
;divide by 10
ldr r0, =1234
ldr r1, =0x1999999A ;((1/10)*2^32)+1
umull r1, r0, r0, r1 ;r0 = 123
|
_________________
http://sloat.bradsoft.net/
#3602 - regularkid - Fri Feb 28, 2003 11:30 pm
Sweet! Thanks! I didn't know that the bios has a divide function. That is great!
_________________
- RegularKid
#12860 - sigma - Sat Nov 29, 2003 12:27 am
Sorry to renew such an old topic, but you can do division with this procedure.
Given that the bit sizes are n / n = n:
1. Shift the dividend one bit left.
2. Rotate into a temp area, that is at least n+1 bits.
3. Compare the temp value with the divisor.
4. If the temp is >= the divisor, subtract the divisor and set the lsb of the dividend. Otherwise, do nothing.
5. Repeat n times
When finished, the previous dividend will now hold the n-bit quotient and the temp area holds the remainder.
Note that this only works for unsigned divisions.
#12888 - col - Sat Nov 29, 2003 6:30 pm
sigma wrote: |
Sorry to renew such an old topic, but you can do division with this procedure.
Given that the bit sizes are n / n = n:
1. Shift the dividend one bit left.
2. Rotate into a temp area, that is at least n+1 bits.
3. Compare the temp value with the divisor.
4. If the temp is >= the divisor, subtract the divisor and set the lsb of the dividend. Otherwise, do nothing.
5. Repeat n times
When finished, the previous dividend will now hold the n-bit quotient and the temp area holds the remainder.
Note that this only works for unsigned divisions. |
A long while ago, i had a look at the bios divide - it uses axactly this technique - shift & subtract.
it does it nice and fast using ARM asm (from zero waitstate mem ?)
What i ended up using for my special purpose though was a divide that uses a lookup table of reciprocals, with a little pre and post processing, I got a nice signed 16:16 fixed point divide with enough accuracy for the job and _much_ faster than the bios swi code.
(iirc it was < 50 cycles all in and accurate to a few(enough) decimal places)
cheers
Col
#12902 - poslundc - Sun Nov 30, 2003 7:06 am
Curious, Col, how big was your LUT?
Could you use it for normal division, or just reciprocal-multiplication? If the former, how did you go about obtaining the reciprocal of the denominator?
Dan.
#12905 - col - Sun Nov 30, 2003 2:27 pm
poslundc wrote: |
Curious, Col, how big was your LUT?
Could you use it for normal division, or just reciprocal-multiplication? If the former, how did you go about obtaining the reciprocal of the denominator?
Dan. |
My LUT was 32kbytes of ROM, I used reciprocal multiplication combined with some pre and post processing - to get a worst case error of about 0.00025
the vast majority of cases were considerably more accurate - the error scaled with the dividend, so very small numbers give very small errors.
~50cycles including dealing with -ve numbers
cheers
Col
#12907 - ampz - Sun Nov 30, 2003 3:04 pm
50 cycles?
Does not sound very efficient... Are you sure the BIOS call is not faster?
#12910 - col - Sun Nov 30, 2003 3:58 pm
ampz wrote: |
50 cycles?
Does not sound very efficient... Are you sure the BIOS call is not faster? |
the bios call best case is probably a little faster, but its worst case is much slower!
if i strip out my pre-processing and post processing, the code will be basicall just a lookup and a multiply, so much faster, but the worst case error will be far higher.
also, i was dividing a signed 16.16 fixed point number by another signed 16.16 fixed point number - using a shift subtract loop to divide numbers with that many bits will kill the cpu :)
so it really depends what you need. If you just want a rough answer, use a plain reciprocal LUT. If you are dividing integers by integers, and you don't mind a poor worst case performance, use the bios call... otherwise, you need a different solution :) somthing like what i have.. or a very large LUT - or maybe use a cordic rotator algorithm...
cheers
Col
#13022 - Miked0801 - Wed Dec 03, 2003 8:10 pm
Don't use the BIOS divide if you can at all help it. It's a poor implementation as it's a 5+ cycles per bit divide routine (it's looped instead of unrolled, see below). The bigger problem though, is that the system generates a SWI interrupt to get at it which pushes all sorts of registers, jumps to about 4 locations, goes through the SWI handler, then gets to the divide. Yuck. If you need a fast divide, place an unrolled version in IWRAM and call it there. Best case around 27 cycles, worst around 114 cycles with a binary bit search on the front end. If that's not good enough, then use a ROM recipricol table. That's 9 cycles for the 32-bit load from ROM, 1-5 for the MUL, and a couple for register overhead (say 3) so 13 - 18 cycles. I've used the table method in 3D rendering and it's worked out well (error low enough not to see polygon breakup).
The divide op codes - this way is designed to have the shift value decremented per bit from 31 to 0. You can do it as a shift 1 each time as well but it takes an extra register I believe. The BIOS adds a decrement and jump check to each of these instead of unrolling it so it's 40% slower than it could be.
cmp r0,r1,lsl#0
adc r2,r2,r2
subcs r0,r0,r1,lsl#0
;The bios does
cmp r2,r0,lsr1
movls r2,r2,lsl1
bcc to compare
then some addition processing and looping again. It's slow...