#7969 - Lupin - Sun Jun 29, 2003 5:24 pm
This function uses 2 LUTs to cover nearly all 32bit integer values by using approximitations. It seems to work fine with small numbers, but as soon as I want to divide larger numbers it doesn't work, that is because I need to shift the value down after multiplying (see the last 2 instructions).
If you notice any part of the code you'd optimize, please tell me if there's any way possible to optimize it.
My intent on writing this function is to create an fast divide function wich covers all real integers without using swi or an lookup table wich is larger then 1mb :)
I would like to solve my problem without using an 32b*32b=64b multiplication if possible, because I think it's slow (maybe slower then an software divide :))
If you notice any part of the code you'd optimize, please tell me if there's any way possible to optimize it.
Code: |
@Negative numbers are not working atm
cmp r1, #0 @check the denomerator against 0 movlt r5, #1 @r5 stores if the number is negative or positive rsblt r1, r1, #0 @turn the number positive mov r3, r1, lsr #16 @Index into divtab (r1 /4096) mov r2, r3, lsl #16 @Shift it back 16 bits sub r2, r1, r2 @The difference ldr r4,.DivDat @load divtab ldr r5,.DiffDat @load difftab ldr r6,[r4,r3] @r6 = divtab[r3] ldr r7,[r5,r3] @r7 = difftab[r3] mul r7, r2, r7 @difference * difftab[r3] add r4, r6, r7 @1/r1 = divtab[r3] + r2*difftab[r3] @Here I would need an 32b*32b=64b multiplication, because the value gets quite big here.... mul r0, r4, r0 @r4*r0 (r0*(1/r1)) @..but here I shift it 20 bits right and it should again fit into an 32bit register mov r0, r0, lsr #20 @Shift r0 20 bits right bx lr |
My intent on writing this function is to create an fast divide function wich covers all real integers without using swi or an lookup table wich is larger then 1mb :)
I would like to solve my problem without using an 32b*32b=64b multiplication if possible, because I think it's slow (maybe slower then an software divide :))