gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > Integer -> String

#7620 - Lupin - Sat Jun 21, 2003 3:27 pm

Do you have any fast code to do that?

#7625 - DekuTree64 - Sat Jun 21, 2003 4:05 pm

The best way is to use the BIOS div, since you need to divide and mod by 10, so they can be done at the same time. Here's how I do it

Code:

char *itoa(int num)
{
   char *str, neg;
   long q, r, a;   //quotient, remainder, abs(q), unused, but divfull needs it

   if(num < 0)
   {
      neg = 1;
      num = -num;
   }
   else
      neg = 0;
   str = tempStr + 256;   //tempStr is a global var, so it never gets overwritten, but you could use a local string as long as you strcpy the return right away
   *--str = 0; //null at the end
   q = num;
   while(q)
   {
      divfull(q, 10, &q, &r, &a);
      *--str = '0' + r;
   }
   if(neg)
      *--str = '-';
   return str;
}


I just typed that from memory, so it may have problems, but you get the idea.

#7626 - Lupin - Sat Jun 21, 2003 4:16 pm

I will try that

#7666 - Lupin - Sun Jun 22, 2003 7:16 pm

I turned the code into this:

char* itos(int num) {
static char str[16];
u32 p=16, r=0, a=0, neg=0;

if(num < 0) {
neg = 1;
num = -num;
}

str[p] = 0; //Add the determinator char

do {
num = divfull(num, 10, &r, &a);
str[--p] = r+48; //Turn number into ascii value
} while(num);

if(neg == 1)
str[--p] = '-';

return str+p;
}

i had some problems with the divfull function, because swi placed the remainder into r1 and not into r2 :/

I'm now trying to get this work in by using asm code

#19357 - mr_schmoe - Sat Apr 17, 2004 9:54 pm

what library do you need to use the divfull, my compiler says it doesn't know what divfull is. I understand that the GBA doesn't have a very good divide so how does the divfull work, maybe I can just implement those rutines instead.

#19363 - DekuTree64 - Sat Apr 17, 2004 11:46 pm

What you need is the quotient and remainder. divfull will return both with a single divide call, so it's faster than doing one divide for the quotient and a second for the remainder. You could just use
Code:
r = q % 10;
q = q / 10;

in place of divfull, and the compiler may optimize it to a single divide, depending on how GCC's divide function works (I've always used div() type functions on GBA, so I don't know much about the library one)

Although since the original post, I've written a much faster itoa in ASM that uses reciprocal multiplication in place of the divide. It's only accurate to 16 bits though, so it won't work if you're printing really big numbers. I could do a 32-bit one, but it would have to use ARM to get the long multiplication instructions, and therefore would be much slower unless placed in IWRAM, but that would be a waste because it's not used that often.
Anyway, here it is
Code:
.global itoa
.thumb
.align 2
.thumb_func
itoa:
push {r4, r5}
mov r5, r0
bpl itoa_plus
neg r0, r0
itoa_plus:
ldr r4, =tempStr @a global array, should be 16 bytes. You could also pass the output location as a parameter if you don't like globals
add r4, #15 @start at the end of the string and work our way back
mov r1, #0
strb r1, [r4]
ldr r1, =6554

itoa_loop:
mov r3, r0
mul r3, r1 @mul by 1/10
lsr r3, r3, #16 @which was fixed-point, so shift down
lsl r2, r3, #3
add r2, r3
add r2, r3 @*8+1+1 = multiplied by 10
sub r0, r2 @which when subtracted from the original gives you the remainder
add r0, #'0'
sub r4, #1
strb r0, [r4]
mov r0, r3
bne itoa_loop

cmp r5, #0
bpl itoa_return
mov r0, #'-'
sub r4, #1
strb r0, [r4]

itoa_return:
mov r0, r4
pop {r4, r5}
bx lr

.pool

_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#19367 - poslundc - Sun Apr 18, 2004 1:26 am

You can check out my posprintf library for fast, free integer-to-string conversion using the familiar sprintf interface.

Dan.

#23832 - Kay - Thu Jul 22, 2004 4:57 pm

You may try this method, wich may be the fastest for converting binary integer to BCD 4 bits (Binary Coded Decimal) ... and then to string (just add $30 to each 4 bit value extracted to get ASCII)

I personnaly use this piece of code translated to ARM assembly in my own applications.

No divide !
No reciprocal multiply !

Source fastly coded under BlitzBasic ...
It's self explainatory, no ? :)

It's works for non signed values from 0 to 65535 and is adaptable for much more ...


INT value to convert in n
result in a

Code:
n = 12345
a = 0
For i = 0 To 15
   Read bcd%
   If ((n Shr i) And 1) = 1 Then
      t1 = $06666666 + a
      t2 = t1 + bcd%
      t3 = t1 Xor bcd%
      t4 = t2 Xor t3
      t5 = ($ffffffff Xor t4) And $11111110
      t6 = (t5 Shr 2) Or (t5 Shr 3)
      a = (t2 - t6)
   EndIf
Next

Print Hex(a)
KeyWait()

End

.table:
   Data   $1
   Data   $2
   Data   $4
   Data   $8
   Data   $16
   Data   $32
   Data   $64
   Data   $128
   Data   $256
   Data   $512
   Data   $1024
   Data   $2048
   Data   $4096
   Data   $8192
   Data   $16384
   Data   $32768
   Data   $65536
   Data   $131072
   Data   $262144
   Data   $524288
   Data   $1048576
   Data   $2097152
   Data   $4194304
   Data   $8388608
   Data   $16777216
   Data   $33554432
   Data   $67108864




Kay.

#23855 - MumblyJoe - Fri Jul 23, 2004 1:12 am

Kay wrote:

Source fastly coded under BlitzBasic ...
It's self explainatory, no ? :)


Not exactly self explanatory... anyone wanna rewrite this in C or asm...
_________________
www.hungrydeveloper.com
Version 2.0 now up - guaranteed at least 100% more pleasing!

#23856 - poslundc - Fri Jul 23, 2004 2:46 am

*cracks knuckles*

Well, my BASIC skills aren't too rusty, I hope... but I believe in C it would translate into something like this:

Code:
static const unsigned int bcdData[] = {
   0x1, 0x2, 0x4, 0x8,
   0x16, 0x32, 0x64, 0x128,
   0x256, 0x512, 0x1024, 0x2048,
   0x4096, 0x8192, 0x16384, 0x32768,
   0x65536, 0x131072, 0x262144, 0x524288,
   0x1048576, 0x2097152, 0x4194304, 0x8388608,
   0x16777216, 0x33554432, 0x67108864
};

unsigned int Bin2BCD(unsigned int n)
{
   unsigned int   a, bcd, i, t1, t2, t3, t4, t5, t6;
   
   a = 0;
   
   for (i = 0; i < 16; i++)
   {
      bcd = bcdData[i];
      if ((n >> i) & 1)
      {
         t1 = 0x06666666 + a;
         t2 = t1 + bcd;
         t3 = t1 ^ bcd;
         t4 = t2 ^ t3;
         t5 = (0xFFFFFFFF ^ t4) & 0x11111110;
         t6 = (t5 >> 2) | (t5 >> 3);
         a = t2 - t6;
      }
   }
   
   return a;
}


It's an interesting-looking routine, although I haven't tested it or anything. And I'm not sure why the last 11 elements of the array are required, since they don't seem to get referenced at all by the routine.

I'm still pretty sure my posprintf is faster, though. ;)

Dan.

#23875 - col - Fri Jul 23, 2004 10:49 am

Please don't use the BIOS divide - its very slow !. Just use the / operator. The divide is by a constant, so the compiler should optimise them away (in ARM mode devkitAdv gcc uses reciprocal multiplication)

Code:

void u32toa(u32 value, char* output) {

   char ch[12];   //enough for max numChars required
   u32 v = value;
   u32 oldv = 0;
   s32 chIdx = 0;         
   do{      
      oldv = v;
      v /= 10;    //compiler should optimise divide by constant - best to check asm output
      ch[chIdx] = (oldv - (v*10) + 48);   //48 is start of numbers in ASCII
      ++chIdx;
   }while(v);
   //we have our decimal digits but in reverse order... so re-order them
   s32 outIdx = 0;
   do{
      --chIdx;
      output[outIdx] = ch[chIdx];
      ++outIdx;
   }while(chIdx > 0);
   output[outIdx] = 0;   //terminating zero
   return;
}


char* output better be big enough to hold max digits possible...
If you want to use signed integers, you will need a little extra pre and post processing

cheers

Col

#23878 - Kay - Fri Jul 23, 2004 12:25 pm

Here's the ARM assembly version of my INT TO BCD convert code, for those who having troubles to translate it into assembly (very simple thing).

It doesn't look very educational to give any code like this one directly.
Best way IMHO is understand things by yourself by giving formulas, methodologies or exemples ... and of course, coding them.


Feel free to compare overall performances with other pieces of code.
Thanks for any feedback ...



Code:
Binary_to_BCD_conversion:
; in  R0 => binary value [24 bits max]
; out R1 <= BCD value [8 digits 0 - 99999999]

   mov   r1,0               ; a
   mov   r2,24
   ldr   r3,=BCD_conversion_table      ; BCD
   ldr   r4,=$06666666            ; carry care overflow value
   ldr   r5,=$11111110            ; carry count mask value
   mov   r6,1
_Binary_to_BCD_conversion_loop:
   ldr   r7,[r3],4!            ; BCD (read BCD table)
   tst   r0,r6 lsl r2            ; test bit
   beq   _Binary_to_BCD_conversion_loop_count
   add   r8,r4,r1            ; t1 = $06666666 + a
   add   r9,r8,r7            ; t2 = t1 + BCD
   eor   r10,r8,r7            ; t3 = t1 XOR BCD
   eor   r11,r9,r10            ; t4 = t2 XOR t3
   mvn   r10,0               ; t3 = $FFFFFFFF
   eor   r10,r10,r11            ; t3 = t3 XOR t4
   and   r10,r10,r5            ; t3 = t3 AND $11111110
   mov   r8,r10 lsr 2            ; t1 = t3 LSR 2
   orr   r8,r8,r10 lsr 3            ; t1 = t1 OR t3 LSR 3
   sub   r1,r9,r8            ; a = t2 - t1
_Binary_to_BCD_conversion_loop_count:
   subs   r2,r2,1
   bge   _Binary_to_BCD_conversion_loop

   mov   pc,lr

BCD_conversion_table:
   @DCD      $16777216,$08388608,$04194304,$02097152,$01048576,$00524288,$00262144,$00131072
   @DCD      $00065536,$00032768,$00016384,$00008192,$00004096,$00002048,$00001024,$00000512
   @DCD      $00000256,$00000128,$00000064,$00000032,$00000016,$00000008,$00000004,$00000002
   @DCD      $00000001





-- Kay

#23890 - f(DarkAngel) - Fri Jul 23, 2004 4:09 pm

http://cvs.sourceforge.net/viewcvs.py/demi/stdgba/src/stdgba_stdlib.S?rev=1.1&view=auto
_________________
death scream...

#23891 - poslundc - Fri Jul 23, 2004 4:21 pm

Ugh... it loops over an SWI6... Dan no like...

Dan.

#23895 - Miked0801 - Fri Jul 23, 2004 6:30 pm

:) - There are faster ways indeed

#26153 - isildur - Wed Sep 08, 2004 5:04 pm

I know it's an old thread but I needed a routine to convert a hex number to BCD in ARM asm, so I took Kay's code above and modified it so it could compile in devkitARM.

For those interested, here it is:
Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
@ u32 Bin2BCD(u32 hexVal);
@
@ in  r0 => binary value [24 bits max]
@ out r0 <= BCD value [8 digits 0 - 99999999]

Bin2BCD:
   stmfd   sp!, {r4, r5, r6, r7, r8, r9, r10, r11}

   mov   r1, #0
   mov   r2, #24
   ldr   r3, =BCD_conversion_table      @ BCD
   ldr   r4, =0x06666666            @ carry care overflow value
   ldr   r5, =0x11111110            @ carry count mask value
   mov   r6, #1

_Binary_to_BCD_conversion_loop:
   ldr   r7, [r3], #4               @ BCD (read BCD table)
   tst   r0, r6, lsl r2               @ test bit

   beq   _Binary_to_BCD_conversion_loop_count
   add   r8, r4, r1               @ t1 = $06666666 + a
   add   r9, r8, r7               @ t2 = t1 + BCD
   eor   r10, r8, r7            @ t3 = t1 XOR BCD
   eor   r11, r9, r10            @ t4 = t2 XOR t3
   mvn   r10, #0               @ t3 = $FFFFFFFF
   eor   r10, r10, r11            @ t3 = t3 XOR t4
   and   r10, r10, r5            @ t3 = t3 AND $11111110
   mov   r8, r10, lsr #2         @ t1 = t3 LSR 2
   orr   r8, r8, r10, lsr #3        @ t1 = t1 OR t3 LSR 3
   sub   r1, r9, r8               @ a = t2 - t1

_Binary_to_BCD_conversion_loop_count:
   subs   r2, r2, #1
   bge   _Binary_to_BCD_conversion_loop
   
   mov r0, r1

   ldmfd   sp!, {r4, r5, r6, r7, r8, r9, r10, r11}
   bx lr

   .pool

BCD_conversion_table:
   .word 0x16777216
   .word 0x08388608
   .word 0x04194304
   .word 0x02097152
   .word 0x01048576
   .word 0x00524288
   .word 0x00262144
   .word 0x00131072
   .word 0x00065536
   .word 0x00032768
   .word 0x00016384
   .word 0x00008192
   .word 0x00004096
   .word 0x00002048
   .word 0x00001024
   .word 0x00000512
   .word 0x00000256
   .word 0x00000128
   .word 0x00000064
   .word 0x00000032
   .word 0x00000016
   .word 0x00000008
   .word 0x00000004
   .word 0x00000002
   .word 0x00000001


By the way, if someone has a faster routine for this, I'd be interested. ;)

#26156 - FluBBa - Wed Sep 08, 2004 5:55 pm

Without actually paying much attention to the actuall algorithm...
What's the difference between these two?
Code:

mvn  r10, #0          @ t3 = $FFFFFFFF
eor  r10, r10, r11        @ t3 = t3 XOR t4
----------------------------------------
mvn  r10, r11        @ t3 = ~t4


And maybe we can do like this...
Code:

mov  r8, r10, lsr #2      @ t1 = t3 LSR 2
orr  r8, r8, r10, lsr #3    @ t1 = t1 OR t3 LSR 3
sub  r1, r9, r8          @ a = t2 - t1
----------------------------------------
orr  r8, r8, r10, lsr #1    @ t1 = t1 OR t3 LSR 1
sub  r1, r9, r8,lsr#2          @ a = t2 - t1 LSR 2

_________________
I probably suck, my not is a programmer.

#26159 - isildur - Wed Sep 08, 2004 6:10 pm

FluBBa wrote:
Without actually paying much attention to the actuall algorithm...
What's the difference between these two?
Code:

mvn  r10, #0          @ t3 = $FFFFFFFF
eor  r10, r10, r11        @ t3 = t3 XOR t4
----------------------------------------
mvn  r10, r11        @ t3 = ~t4


And maybe we can do like this...
Code:

mov  r8, r10, lsr #2      @ t1 = t3 LSR 2
orr  r8, r8, r10, lsr #3    @ t1 = t1 OR t3 LSR 3
sub  r1, r9, r8          @ a = t2 - t1
----------------------------------------
orr  r8, r8, r10, lsr #1    @ t1 = t1 OR t3 LSR 1
sub  r1, r9, r8,lsr#2          @ a = t2 - t1 LSR 2


Code:

mvn  r10, r11        @ t3 = ~t4


Yep this works, but not the second optimization.

#26191 - FluBBa - Thu Sep 09, 2004 9:09 am

Sorry, the last one should be
Code:

orr  r8, r10, r10, lsr #1    @ t1 = t1 OR t3 LSR 1
sub  r1, r9, r8,lsr#2          @ a = t2 - t1 LSR 2

_________________
I probably suck, my not is a programmer.

#26200 - isildur - Thu Sep 09, 2004 2:42 pm

Cool, now it works :).

#26201 - FluBBa - Thu Sep 09, 2004 3:43 pm

A little recoding to save some registers...
Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
@ u32 Bin2BCD(u32 hexVal);
@
@ in  r0 => binary value [24 bits max]
@ out r0 <= BCD value [8 digits 0 - 99999999]

Bin2BCD:
   stmfd   sp!, {r4, r5, r6, r7, r8}

   mov   r1, #0
   ldr   r3, =BCD_conversion_table      @ BCD
   ldr   r4, =0x06666666            @ carry care overflow value
   ldr   r5, =0x11111110            @ carry count mask value

_Binary_to_BCD_conversion_loop:
   ldr   r7, [r3], #4               @ BCD (read BCD table)
   movs   r0, r0, lsr #1               @ test bit

   bcc   _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
   add   r8, r4, r1               @ t1 = $06666666 + a
   add   r6, r8, r7               @ t2 = t1 + BCD
   eor   r2, r8, r7            @ t3 = t1 XOR BCD
   eor   r2, r6, r2            @ t3 = t2 XOR t3
   mvn   r2, r2               @ t3 = ~t3
   and   r2, r2, r5            @ t3 = t3 AND $11111110
   orr   r8, r2, r2, lsr #1        @ t1 = t3 OR t3 LSR 1
   sub   r1, r6, r8, lsr #2           @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check
   bne   _Binary_to_BCD_conversion_loop
   mov r0, r1

   ldmfd   sp!, {r4, r5, r6, r7, r8}
   bx lr


   .pool

BCD_conversion_table:
   .word 0x00000001
   .word 0x00000002
   .word 0x00000004
   .word 0x00000008
   .word 0x00000016
   .word 0x00000032
   .word 0x00000064
   .word 0x00000128
   .word 0x00000256
   .word 0x00000512
   .word 0x00001024
   .word 0x00002048
   .word 0x00004096
   .word 0x00008192
   .word 0x00016384
   .word 0x00032768
   .word 0x00065536
   .word 0x00131072
   .word 0x00262144
   .word 0x00524288
   .word 0x01048576
   .word 0x02097152
   .word 0x04194304
   .word 0x08388608
   .word 0x16777216

I think that should work
_________________
I probably suck, my not is a programmer.

#26203 - isildur - Thu Sep 09, 2004 4:04 pm

Great job! One thing though, the label _Binary_to_BCD_conversion_loop_check is missing a colon :)

You know, I'm thinking of creating a repository of fast ARM utility routines for the GBA. Instead of having to search this forum, common fast routines could be found at one web page. Coders would be challenged to improve and optimize those routines and contribute new ones, always in the quest to make the fastest code.

I will prepare something...

#26210 - Miked0801 - Thu Sep 09, 2004 5:46 pm

Please. That's the kind of thing I could sink my teeth into :)

#26225 - ecurtz - Thu Sep 09, 2004 8:46 pm

That would be great. I'm still learning ARM assembly and I find that reading through code and thinking about optimizing it is the easiest way to improve my knowledge of the instruction set.

#26254 - FluBBa - Fri Sep 10, 2004 9:33 am

I've been thinking about doing the same thing for some time now, I you start it up I'm sure to contribute.
I have a lot of routines in my emulators, the latest is a Gamma corrector which is quite good and not so specialized. Takes a byte (0x00-0xFF) and gamma value and spits out 5bit (0x00-0x1F) gamma corrected value.
We should probably add the pixel plotter and line drawing routines as well.
Sqrt, div, lots of stuff to add.. =)
_________________
I probably suck, my not is a programmer.

#26263 - isildur - Fri Sep 10, 2004 5:50 pm

Hey that's great, I'll PM you soon and we will put something up.

#26372 - [mRg] - Tue Sep 14, 2004 12:24 am

Sounds like a fantastic idea guys ! keep us posted :)

[mRg]

#26390 - f(DarkAngel) - Tue Sep 14, 2004 3:02 pm

Well, for a start, here's my division implementation for Demi (the lib i'm working on; sf.net cvs is terribly outdated & abondoned, still waiting for approval at savannah):

Code:

@ This program is free software; you can redistribute it and/or modify
@ it under the terms of the GNU General Public License as published by
@ the Free Software Foundation; either version 2 of the License, or
@ (at your option) any later version.

.align
.arm

div:
#ifdef DIV_ACTS_AS_SWI6          @ preserve r2 & r12, r3=|a/b], r1=a%b
    stmfd  sp!, {r0-r2, r12}
#endif

    eor    r12, r0, r1           @ sgn(12) == sgn(r0*r1)
    movs   r2, r0, lsl #1        @ r0 = abs(r0)
    rsbcs  r0, r0, #0
    movs   r2, r1, lsl #1        @ r1 = abs(r1)
    rsbcs  r1, r1, #0
    mov    r2, #1                @ bit_c
0:
    cmp    r0, r1
    bls    1f
    mov    r1, r1, lsl #1
    movs   r2, r2, lsl #1
    bne    0b
1:

    mov    r3, #0                @ r3 = |res|
    tst    r2, r2
    beq    3f

2:
    cmp    r1, r0
    subls  r0, r0, r1
    orrls  r3, r3, r2
    mov    r1, r1, lsr #1
    movs   r2, r2, lsr #1
    bne    2b
3:
    movs   r12, r12, lsl #1
#ifdef DIV_ACTS_AS_SWI6
    ldmfd  sp!, {r0-r2, r12}
    mul    r1, r3, r1
    rsbcs  r1, r1, #0
    sub    r1, r0, r1
#endif

#ifdef DIV_UNSIGNED
    mov    r0, r3
#else   
    movcc  r0, r3
    rsbcs  r0, r3, #0
#endif

#if defined(DIV_ACTS_AS_SWI6) && !defined(DIV_UNSIGNED) && defined(DIV_MOD_ABS)    @ r1 = |a%b|
    cmp    r1, #0
    rsblt  r1, r1, #0
#endif

    bx    lr


It's unrolled, for sake of writing small code, assuming that this code will be placed in iwram.
fastest & shortest version will be when defined DIV_UNSIGNED && !defined DIV_ACTS_AS_SWI6 (the loops remain same, therefore it won't be a serious difference). thumb version would be much slower, so i didn't bother.

Code:
udiv:
    mov    r2, #1            @ bit_c
0:
    cmp    r0, r1
    bls    1f
    mov    r1, r1, lsl #1
    movs   r2, r2, lsl #1
    bne    0b

1:
    mov    r3, #0            @ r3 = |res|
    tst    r2, r2
    beq    3f

2:
    cmp    r1, r0
    subls  r0, r0, r1
    orrls  r3, r3, r2
    mov    r1, r1, lsr #1
    movs   r2, r2, lsr #1
    bne    2b
3:
    mov    r0, r3

    bx    lr


BTW, disassembling DarkFader's bios dump (in CowBite & VBA) output the following swi6 & swi7 (includes unnecessary stuff like swapping r0 and r1, an unused cmp r0,r2 after label 1, instead of 3 hi tests, bls could be used; also seems to be a erroneous one, never gave a correct result, didn't get what actually's intended to be done either...)

Code:

swi6:
    mov    r3, r0
    mov    r0, r1
    mov    r1, r3
    ands   r3, r1, #(1<<31)
    rsbmi  r1, r1, #0
    eors   r12, r3, r0, asr #20        @ CowBite didn't output an asr #20
    rsbcs  r0, r0, #0
    movs   r2, r1
0:
    cmps   r2, r0, lsr #1
    movls  r2, r2, lsl #1
    bcc    0b
1:
    cmps   r0, r2
    adc    r3, r3, r3
    subcs  r0, r0, r2
    teqs   r2, r1
    movne  r2, r2, lsr #1
    bne    1b

    mov    r1, r0
    mov    r0, r3
    movs   r12, r12, lsl #1
    rsbcs  r0, r0, #0
    rsbmi  r1, r1, #0
    bx     lr
   
swi7:
    stmfd  sp!, {r4}
    mov    r12, r0
    mov    r1, #1

0:
    cmp    r0, r1
    movhi  r0, r0, lsr #1
    movhi  r1, r1, lsl #1
    bhi    0b
1:
    mov    r0, r12
    mov    r4, r1
    mov    r3, #0
    mov    r2, r1
2:
    cmps   r0, r2
    adc    r3, r3, r3
    subcs  r0, r0, r2
    teqs   r2, r1
    movne  r2, r2, lsr #1
    bne    2b
   
    add    r1, r1, r3
    movs   r1, r1, lsr #1
    cmp    r1, r4
    bcc    1b
    mov    r0, r4
    ldmfd  sp!, {r4}
    bx     lr


Anyone got working the bios dump? Or have a working dump?
_________________
death scream...

#26393 - isildur - Tue Sep 14, 2004 3:43 pm

Thanks, I will add it to the upcoming repository. But, is this faster than using swi 0x60000 ?

#26395 - f(DarkAngel) - Tue Sep 14, 2004 5:18 pm

I guess so, when placed in iwram.
However, since i haven't seen the swi6 source (not talking about the non-working dump), nor have the actual hardware (+ emulator timing isn't perfect), i can only guess.
_________________
death scream...

#26400 - SmileyDude - Tue Sep 14, 2004 7:09 pm

depending on what you are using the output for, you could optimise differently. For example, in one game I was working on, I just wanted a 5 digit, 0 filled number. So, since I knew that my numbers were always going to be right aligned, I could just fill in the output string directly.

Here is the code I was using (I'm using a / and a % -- probably should modify it to only use the / like the example above, but for demonstration purposes this works):

Code:
for(int i = 4; i >= 0; i--) {
    buffer[i] = '0' + aNumber % 10;
    aNumber /= 10;
}


Basically, the number is passed in via aNumber and the result is returned in buffer. In my routine, I use a static buffer, since I'm using the result right away -- would be really easy to modify to use a passed in buffer, but I didn't feel like it for my proggy.
_________________
dennis

#26413 - keldon - Wed Sep 15, 2004 1:20 am

Here is my IntToStr for the x86. D$ means DWORD PTR; and B$ means BYTE PTR. If num is more than one digit, then what is required to extract the last digit is to work out num without the last digit, and then subtract that from num.

Had we been doing Int to Hex String we would simply AND by 0xf0, but we are working in decimal, (num / 10) * 10 would do us fine.

At the end of the End_If statement, eax = num. EDI is updated so that after you make your recursive call the position is updated. For num=100, two recursive calls are made before the original call displays. And EAX will be multiplied by 10 ( note that before a recursive call is made that EAX is divided by 10 ).

Code:
; int intToStr ( num, str )
; eax = number to convert
; edi = address of string
;
; PURPOSE:
;    convert integer, num to a String at address str
;
; RETURNS:
;    eax = eax * 10
;    edi = edi + 1
intToStr:

    If eax < 10
        add eax '0'             ; eax = num + '0'
        mov B$edi al
        sub eax '0'             ; eax = num
       
    Else
        push eax                ; store num
       
            xor edx edx
            mov ebx 10         
            div ebx             ; eax = num / 10
           
            call intToStr
                ; eax = (num / 10) * 10
            mov ebx eax
           
        pop eax                 ; restore num
        push eax                ; store num
       
            sub eax ebx             ; eax = num, ebx = (num / 10) * 10
       
            add eax '0'         ; copy number to string
            mov B$edi al
           
        pop eax
       
    End_If
   
    mov ebx 10              ; eax = eax * 10
    mul ebx                 ; .
   
    inc edi
   
    ret


For those who do not know X86 assembly, here is what you're missing. First of all If, End_If are macros. AL is the low byte of EAX, which is 32 bit. EBX, EAX and EDI are all 32-bit registers. EDX is cleared before dividing EAX by EBX because EDX is used by the division as the high DWORD.

#26470 - f(DarkAngel) - Thu Sep 16, 2004 2:38 pm

some basic stuff...

Code:
@ void* plot_modeX(void* vram, u8 x, u8 y, u16 color)
@ return: address of the destination pixel

.align
.arm

plot_mode3:
    add   r1, r2, lsl #8
    sub   r1, r2, lsl #4
    add   r0, r1, lsl #1
    strh  r3, [r0]
    bx    lr

.align
.arm

@ r3 is assumed to be comibination of two adjacent pixels
plot_mode4:
    add   r0, r2, lsl #8
    sub   r0, r2, lsl #4
    strh  r3, [r0]
    bx    lr


.align
.arm

plot_mode5:
    add   r1, r2, lsl #7
    add   r1, r2, lsl #5
    add   r0, r1, lsl #1
    strh  r3, [r0]
    bx    lr


@ void/void* flip()
@ return: nothing/vram

.align
.arm

@ doesn't look like the fastest it could be...
flip:
    ldr   r1, =0x4000000       @ REG_DISPCNT
    ldrh  r2, [r1]
    ldr   r3, 0x10
    eor   r2, r2, r3
#ifdef FLIP_RETURNS_VRAM
    tst   r2, r3
    ldr   r0, =0x6000000       @ MEM_VRAM
    ldrne r3, =0xa000          @ PAGE_SIZE
    addne r0, r0, r3
    strh  r2, [r1]
#endif
    bx    lr


i've just written them, ie i suggest you to test them before adding.
_________________
death scream...


Last edited by f(DarkAngel) on Thu Sep 16, 2004 6:35 pm; edited 1 time in total

#26473 - FluBBa - Thu Sep 16, 2004 3:44 pm

Mode3 plotter (quote DekuTree64)
So you don't have to supply the VRAM base address (as you don't have any pages).
Code:

@r0 = x
@r1 = y
@r2 = color
.global PlotPixel
.arm
.align 2
.section .iwram, "ax", %progbits
PlotPixel:
rsb   r1, r1, r1, LSL #4    @r1 = y*15
add   r1, r0, r1, LSL #5    @r1 = x+y*15*32 = x+y*480
add   r0, r0, #0x6000000  @r0 = x+VRAM base
strh   r2, [r0, r1]            @this comes out to VRAM base+x*2+y*480
bx     lr

Flip code...

Code:

flip:
    mov   r1, #0x4000000       @ REG_DISPCNT
    ldrh  r2, [r1]
    eor   r2, r2, #0x10
#ifdef FLIP_RETURNS_VRAM
    tst   r2, #0x10
    mov   r0, #0x6000000       @ MEM_VRAM
    addne r0, #0xA000          @ PAGE_SIZE
#endif
    strh  r2, [r1]
    bx    lr

_________________
I probably suck, my not is a programmer.

#27190 - Kay - Tue Oct 05, 2004 9:22 pm

In much cases, my previous code sample may be suffisant (for scores, bullets count, life, ...), but in some time critical application you may want it running much more faster.


If someone stills interested, there's a faster way at the cost of only 3~4 KBytes of lookup tables and only 2 BCD additions running at constant machine time for values from 0 to 99.999.999



32 bits lookup tables values are generated this way (for 8 digits):

Code:
for i = 0 to 255
 val_00_07(i) = BCD(i)
 val_08_15(i) = BCD(i shl 8)
 val_16_23(i) = BCD(i shl 16)
next




and then your INT converted to BCD like that:


Code:
v1 = a AND $FF
v2 = (a SHR 8) AND $FF
v3 = (a SHR 16) AND $FF

a = BCD_ADD( val_00_07(v1) , val_08_15(v2) )
a = BCD_ADD( a , val_16_23(v3) )



Don't ask me for source code, i only give clues in order you learn how this works ;)


-- Kay

#34744 - tum_ - Tue Jan 25, 2005 11:30 am

Hi guys, sorry for reviving this old thread that I've just come cross.
It looks to me that this:

FluBBa wrote:
A little recoding to save some registers...
Code:

[snipped]

   mvn   r2, r2               @ t3 = ~t3
   and   r2, r2, r5            @ t3 = t3 AND $11111110

[snipped]




can be replaced by this:

Code:


   bic   r2, r5, r2            @ t3 = ~t3 AND $11111110



i'm not a gba programmer, so correct me if i'm wrong but i would use r12 instead of r8 in your function because according to APCS you don't need to save/restore it.

#34745 - tum_ - Tue Jan 25, 2005 12:42 pm

You don't need r7 either, as r2 can be re-used:

FluBBa wrote:
A little recoding to save some registers...
Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
@ u32 Bin2BCD(u32 hexVal);
@
[snip]
_Binary_to_BCD_conversion_loop:
   ldr   r7, [r3], #4               @ BCD (read BCD table)
   movs   r0, r0, lsr #1               @ test bit

   bcc   _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
   add   r8, r4, r1               @ t1 = $06666666 + a
   add   r6, r8, r7               @ t2 = t1 + BCD
   eor   r2, r8, r7            @ t3 = t1 XOR BCD
   eor   r2, r6, r2            @ t3 = t2 XOR t3
   mvn   r2, r2               @ t3 = ~t3
   and   r2, r2, r5            @ t3 = t3 AND $11111110
   orr   r8, r2, r2, lsr #1        @ t1 = t3 OR t3 LSR 1
   sub   r1, r6, r8, lsr #2           @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check
[snip]



Becomes:

Code:

_Binary_to_BCD_conversion_loop:
   ldr   r2, [r3], #4               @ BCD (read BCD table)
   movs   r0, r0, lsr #1               @ test bit

   bcc   _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
   add   r8, r4, r1               @ t1 = $06666666 + a
   add   r6, r8, r2               @ t2 = t1 + BCD
@ here we re-use r2 for t3 as we don't need BCD anymore:
   eor   r2, r8, r2            @ t3 = t1 XOR BCD
   eor   r2, r6, r2            @ t3 = t2 XOR t3
   bic   r2, r5, r2            @ t3 = ~t3 AND $11111110
   orr   r8, r2, r2, lsr #1        @ t1 = t3 OR t3 LSR 1
   sub   r1, r6, r8, lsr #2           @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check


Not tested.

#34796 - FluBBa - Wed Jan 26, 2005 11:26 am

Nice, some more register removal...
Code:

_Binary_to_BCD_conversion_loop:
   ldr   r2, [r3], #4               @ BCD (read BCD table)
   movs   r0, r0, lsr #1               @ test bit

   bcc   _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
   add   r1, r4, r1               @ t1 = $06666666 + a
   add   r12, r1, r2               @ t2 = t1 + BCD
@ here we re-use r2 for t3 as we don't need BCD anymore:
   eor   r2, r1, r2            @ t3 = t1 XOR BCD
   eor   r2, r12, r2            @ t3 = t2 XOR t3
   bic   r2, r5, r2            @ t3 = ~t3 AND $11111110
   orr   r1, r2, r2, lsr #1        @ t1 = t3 OR t3 LSR 1
   sub   r1, r12, r1, lsr #2           @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check

_________________
I probably suck, my not is a programmer.

#34810 - isildur - Wed Jan 26, 2005 5:26 pm

This optimization is now updated on the
GBA ARM Code Repository.

mod edit: And the link is now fixed.

#35054 - Lupin - Sat Jan 29, 2005 10:45 pm

Why did you bump that old post? ;)

It wasn't really about BCD conversion and i think BCD can't be used in real projects because it of course would be hard for the average user to read the numbers =)

Well, i figured that for real int->string converting of smaller numbers (16 bits) a table might be the best (just counting how often i can subtract 1000,100,10... and so on). I used this on Pokemon Mini too and in most cases you won't need to do it very often anyways (maybe a few times per frame).
_________________
Team Pokeme
My blog and PM ASM tutorials

#41497 - gator - Fri Apr 29, 2005 4:50 am

I optimized this further by adding 0x06666666 to all the entries in the table...

Code:

Bin2BCD:
    stmfd   sp!, {r4,r5}

   mov   r1, #0
   ldr   r3, =BCD_conversion_table      @ BCD
   ldr   r5, =0x11111110                @ carry count mask value

_Binary_to_BCD_conversion_loop:
   ldr   r2, [r3], #4             @ BCD (read BCD table)
   movs  r0, r0, lsr #1           @ test bit

   bcc   _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
   add   r4, r2, r1               @  r4 = r2 + r1
   eor   r2, r1, r2               @  r2 = r1 XOR r2
   eor   r2, r4, r2               @  r2 = r4 XOR r2
   bic   r2, r5, r2               @  r2 = 0x11111110 AND NOT r2
   orr   r2, r2, r2, lsr #1       @  r2 = r2 OR (r2 LSR 1)
   sub   r1, r4, r2, lsr #2       @  r1 = r4 -  (r2 LSR 2)

_Binary_to_BCD_conversion_loop_check:
   bne   _Binary_to_BCD_conversion_loop
   mov r0, r1

   ldmfd   sp!, {r4,r5}
   bx lr


   .pool

BCD_conversion_table:
   .word 0x06666667
   .word 0x06666668
   .word 0x0666666a
   .word 0x0666666e
   .word 0x0666667c
   .word 0x06666698
   .word 0x066666ca
   .word 0x0666678e
   .word 0x066668bc
   .word 0x06666b78
   .word 0x0666768a
   .word 0x066686ae
   .word 0x0666a6fc
   .word 0x0666e7f8
   .word 0x0667c9ea
   .word 0x06698dce
   .word 0x066cbb9c
   .word 0x067976d8
   .word 0x068c87aa
   .word 0x06b8a8ee
   .word 0x076aebdc
   .word 0x086fd7b8
   .word 0x0a7fa96a
   .word 0x0e9eec6e
   .word 0x1cddd87c
   .word 0x39bbaa98
   .word 0x6d76eeca

#41501 - tepples - Fri Apr 29, 2005 5:10 am

Lupin wrote:
Why did you bump that old post? ;)

It is always better to post in an existing thread than to start a new one (Flash).
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#41524 - isildur - Fri Apr 29, 2005 2:21 pm

Cool, I will test it and add it to the
GBA ARM Code Repository.

#41527 - FluBBa - Fri Apr 29, 2005 2:48 pm

Gator: very nice work.
Just swap r5 to r12 and you save one of the stack accesses also.
_________________
I probably suck, my not is a programmer.

#42018 - Kay - Wed May 04, 2005 12:40 pm

As i told it previously under this topic, there's a faster way using a 3KB BCD lookup table.
This will remove the inner loop, improve overall performances, and fix machine time (worst & best cases took the same machine time amount).
Please take a look: (GoldRoad ASM format, using some optimisations given by Flubba and Isildur, thanx to them)

Code:

   @ARM

   ldr   r0,=1234567
   and   r1,r0,$ff

   ldr   r3,=BCD_add_table   ; BCD lookup table

   ldr   r2,[r3,r1 lsl 2]   ; BCD (read BCD table)

   mov   r1,r0 lsr 8
   and   r1,r1,$ff
   add   r3,r3,1024
   ldr   r1,[r3,r1 lsl 2]   ; BCD (read BCD table)

   ldr   r5,=$11111110      ; carry count mask value

   add   r4,r2,r1      ; r4 = r2 + r1
   eor   r2,r1,r2      ; r2 = r1 XOR r2
   eor   r2,r4,r2      ; r2 = r4 XOR r2
   bic   r2,r5,r2      ; r2 = 0x11111110 AND NOT r2
   orr   r2,r2,r2 lsr 1      ; r2 = r2 OR (r2 LSR 1)
   sub   r1,r4,r2 lsr 2      ; r1 = r4 -  (r2 LSR 2)

   mov   r2,r0 lsr 16
   and   r2,r2,$ff
   add   r3,r3,1024
   ldr   r2,[r3,r2 lsl 2]   ; BCD (read BCD table)

   add   r4,r2,r1      ; r4 = r2 + r1
   eor   r2,r1,r2      ; r2 = r1 XOR r2
   eor   r2,r4,r2      ; r2 = r4 XOR r2
   bic   r2,r5,r2      ; r2 = 0x11111110 AND NOT r2
   orr   r2,r2,r2 lsr 1      ; r2 = r2 OR (r2 LSR 1)
   sub   r1,r4,r2 lsr 2      ; r1 = r4 -  (r2 LSR 2)

loop:
   b   loop

BCD_add_table:
; bits 0 ? 7
   @dcd   $00000000
   @dcd   $06666667,$06666668,$06666669,$0666666A,$0666666B,$0666666C,$0666666D,$0666666E,$0666666F,$06666676,$06666677,$06666678,$06666679,$0666667A,$0666667B,$0666667C
   @dcd   $0666667D,$0666667E,$0666667F,$06666686,$06666687,$06666688,$06666689,$0666668A,$0666668B,$0666668C,$0666668D,$0666668E,$0666668F,$06666696,$06666697,$06666698
   @dcd   $06666699,$0666669A,$0666669B,$0666669C,$0666669D,$0666669E,$0666669F,$066666A6,$066666A7,$066666A8,$066666A9,$066666AA,$066666AB,$066666AC,$066666AD,$066666AE
   @dcd   $066666AF,$066666B6,$066666B7,$066666B8,$066666B9,$066666BA,$066666BB,$066666BC,$066666BD,$066666BE,$066666BF,$066666C6,$066666C7,$066666C8,$066666C9,$066666CA
   @dcd   $066666CB,$066666CC,$066666CD,$066666CE,$066666CF,$066666D6,$066666D7,$066666D8,$066666D9,$066666DA,$066666DB,$066666DC,$066666DD,$066666DE,$066666DF,$066666E6
   @dcd   $066666E7,$066666E8,$066666E9,$066666EA,$066666EB,$066666EC,$066666ED,$066666EE,$066666EF,$066666F6,$066666F7,$066666F8,$066666F9,$066666FA,$066666FB,$066666FC
   @dcd   $066666FD,$066666FE,$066666FF,$06666766,$06666767,$06666768,$06666769,$0666676A,$0666676B,$0666676C,$0666676D,$0666676E,$0666676F,$06666776,$06666777,$06666778
   @dcd   $06666779,$0666677A,$0666677B,$0666677C,$0666677D,$0666677E,$0666677F,$06666786,$06666787,$06666788,$06666789,$0666678A,$0666678B,$0666678C,$0666678D,$0666678E
   @dcd   $0666678F,$06666796,$06666797,$06666798,$06666799,$0666679A,$0666679B,$0666679C,$0666679D,$0666679E,$0666679F,$066667A6,$066667A7,$066667A8,$066667A9,$066667AA
   @dcd   $066667AB,$066667AC,$066667AD,$066667AE,$066667AF,$066667B6,$066667B7,$066667B8,$066667B9,$066667BA,$066667BB,$066667BC,$066667BD,$066667BE,$066667BF,$066667C6
   @dcd   $066667C7,$066667C8,$066667C9,$066667CA,$066667CB,$066667CC,$066667CD,$066667CE,$066667CF,$066667D6,$066667D7,$066667D8,$066667D9,$066667DA,$066667DB,$066667DC
   @dcd   $066667DD,$066667DE,$066667DF,$066667E6,$066667E7,$066667E8,$066667E9,$066667EA,$066667EB,$066667EC,$066667ED,$066667EE,$066667EF,$066667F6,$066667F7,$066667F8
   @dcd   $066667F9,$066667FA,$066667FB,$066667FC,$066667FD,$066667FE,$066667FF,$06666866,$06666867,$06666868,$06666869,$0666686A,$0666686B,$0666686C,$0666686D,$0666686E
   @dcd   $0666686F,$06666876,$06666877,$06666878,$06666879,$0666687A,$0666687B,$0666687C,$0666687D,$0666687E,$0666687F,$06666886,$06666887,$06666888,$06666889,$0666688A
   @dcd   $0666688B,$0666688C,$0666688D,$0666688E,$0666688F,$06666896,$06666897,$06666898,$06666899,$0666689A,$0666689B,$0666689C,$0666689D,$0666689E,$0666689F,$066668A6
   @dcd   $066668A7,$066668A8,$066668A9,$066668AA,$066668AB,$066668AC,$066668AD,$066668AE,$066668AF,$066668B6,$066668B7,$066668B8,$066668B9,$066668BA,$066668BB
;bits 8 ? 15:
   @dcd   $00000000
   @dcd   $066668BC,$06666B78,$06666DCE,$0666768A,$066678E6,$06667B9C,$06667DF8,$066686AE,$0666896A,$06668BC6,$06668E7C,$066696D8,$0666998E,$06669BEA,$06669EA6,$0666A6FC
   @dcd   $0666A9B8,$0666AC6E,$0666AECA,$0666B786,$0666B9DC,$0666BC98,$0666BEEE,$0666C7AA,$0666CA66,$0666CCBC,$0666CF78,$0666D7CE,$0666DA8A,$0666DCE6,$0666DF9C,$0666E7F8
   @dcd   $0666EAAE,$0666ED6A,$0666EFC6,$0666F87C,$0666FAD8,$0666FD8E,$0666FFEA,$066768A6,$06676AFC,$06676DB8,$0667766E,$066778CA,$06677B86,$06677DDC,$06678698,$066788EE
   @dcd   $06678BAA,$06678E66,$066796BC,$06679978,$06679BCE,$06679E8A,$0667A6E6,$0667A99C,$0667ABF8,$0667AEAE,$0667B76A,$0667B9C6,$0667BC7C,$0667BED8,$0667C78E,$0667C9EA
   @dcd   $0667CCA6,$0667CEFC,$0667D7B8,$0667DA6E,$0667DCCA,$0667DF86,$0667E7DC,$0667EA98,$0667ECEE,$0667EFAA,$0667F866,$0667FABC,$0667FD78,$0667FFCE,$0668688A,$06686AE6
   @dcd   $06686D9C,$06686FF8,$066878AE,$06687B6A,$06687DC6,$0668867C,$066888D8,$06688B8E,$06688DEA,$066896A6,$066898FC,$06689BB8,$06689E6E,$0668A6CA,$0668A986,$0668ABDC
   @dcd   $0668AE98,$0668B6EE,$0668B9AA,$0668BC66,$0668BEBC,$0668C778,$0668C9CE,$0668CC8A,$0668CEE6,$0668D79C,$0668D9F8,$0668DCAE,$0668DF6A,$0668E7C6,$0668EA7C,$0668ECD8
   @dcd   $0668EF8E,$0668F7EA,$0668FAA6,$0668FCFC,$0668FFB8,$0669686E,$06696ACA,$06696D86,$06696FDC,$06697898,$06697AEE,$06697DAA,$06698666,$066988BC,$06698B78,$06698DCE
   @dcd   $0669968A,$066998E6,$06699B9C,$06699DF8,$0669A6AE,$0669A96A,$0669ABC6,$0669AE7C,$0669B6D8,$0669B98E,$0669BBEA,$0669BEA6,$0669C6FC,$0669C9B8,$0669CC6E,$0669CECA
   @dcd   $0669D786,$0669D9DC,$0669DC98,$0669DEEE,$0669E7AA,$0669EA66,$0669ECBC,$0669EF78,$0669F7CE,$0669FA8A,$0669FCE6,$0669FF9C,$066A67F8,$066A6AAE,$066A6D6A,$066A6FC6
   @dcd   $066A787C,$066A7AD8,$066A7D8E,$066A7FEA,$066A88A6,$066A8AFC,$066A8DB8,$066A966E,$066A98CA,$066A9B86,$066A9DDC,$066AA698,$066AA8EE,$066AABAA,$066AAE66,$066AB6BC
   @dcd   $066AB978,$066ABBCE,$066ABE8A,$066AC6E6,$066AC99C,$066ACBF8,$066ACEAE,$066AD76A,$066AD9C6,$066ADC7C,$066ADED8,$066AE78E,$066AE9EA,$066AECA6,$066AEEFC,$066AF7B8
   @dcd   $066AFA6E,$066AFCCA,$066AFF86,$066B67DC,$066B6A98,$066B6CEE,$066B6FAA,$066B7866,$066B7ABC,$066B7D78,$066B7FCE,$066B888A,$066B8AE6,$066B8D9C,$066B8FF8,$066B98AE
   @dcd   $066B9B6A,$066B9DC6,$066BA67C,$066BA8D8,$066BAB8E,$066BADEA,$066BB6A6,$066BB8FC,$066BBBB8,$066BBE6E,$066BC6CA,$066BC986,$066BCBDC,$066BCE98,$066BD6EE,$066BD9AA
   @dcd   $066BDC66,$066BDEBC,$066BE778,$066BE9CE,$066BEC8A,$066BEEE6,$066BF79C,$066BF9F8,$066BFCAE,$066BFF6A,$066C67C6,$066C6A7C,$066C6CD8,$066C6F8E,$066C77EA,$066C7AA6
   @dcd   $066C7CFC,$066C7FB8,$066C886E,$066C8ACA,$066C8D86,$066C8FDC,$066C9898,$066C9AEE,$066C9DAA,$066CA666,$066CA8BC,$066CAB78,$066CADCE,$066CB68A,$066CB8E6
; bits 16 ? 23
   @dcd   $00000000
   @dcd   $066CBB9C,$067976D8,$067FCC6E,$068C87AA,$0698DCE6,$069F987C,$06ABEDB8,$06B8A8EE,$06BEFE8A,$06CBB9C6,$06D86EFC,$06DECA98,$06EB7FCE,$06F7DB6A,$06FE96A6,$076AEBDC
   @dcd   $0777A778,$077DFCAE,$078AB7EA,$07976D86,$079DC8BC,$07AA7DF8,$07B6D98E,$07BD8ECA,$07C9EA66,$07D69F9C,$07DCFAD8,$07E9B66E,$07F66BAA,$07FCC6E6,$08697C7C,$086FD7B8
   @dcd   $087C8CEE,$0888E88A,$088F9DC6,$089BF8FC,$08A8AE98,$08AF69CE,$08BBBF6A,$08C87AA6,$08CECFDC,$08DB8B78,$08E7E6AE,$08EE9BEA,$08FAF786,$0967ACBC,$096E67F8,$097ABD8E
   @dcd   $098778CA,$098DCE66,$099A899C,$09A6DED8,$09AD9A6E,$09B9EFAA,$09C6AAE6,$09CD667C,$09D9BBB8,$09E676EE,$09ECCC8A,$09F987C6,$09FFDCFC,$0A6C9898,$0A78EDCE,$0A7FA96A
   @dcd   $0A8BFEA6,$0A98B9DC,$0A9F6F78,$0AABCAAE,$0AB87FEA,$0ABEDB86,$0ACB96BC,$0AD7EBF8,$0ADEA78E,$0AEAFCCA,$0AF7B866,$0AFE6D9C,$0B6AC8D8,$0B777E6E,$0B7DD9AA,$0B8A8EE6
   @dcd   $0B96EA7C,$0B9D9FB8,$0BA9FAEE,$0BB6B68A,$0BBD6BC6,$0BC9C6FC,$0BD67C98,$0BDCD7CE,$0BE98D6A,$0BEFE8A6,$0BFC9DDC,$0C68F978,$0C6FAEAE,$0C7C69EA,$0C88BF86,$0C8F7ABC
   @dcd   $0C9BCFF8,$0CA88B8E,$0CAEE6CA,$0CBB9C66,$0CC7F79C,$0CCEACD8,$0CDB686E,$0CE7BDAA,$0CEE78E6,$0CFACE7C,$0D6789B8,$0D6DDEEE,$0D7A9A8A,$0D86EFC6,$0D8DAAFC,$0D9A6698
   @dcd   $0DA6BBCE,$0DAD776A,$0DB9CCA6,$0DC687DC,$0DCCDD78,$0DD998AE,$0DDFEDEA,$0DECA986,$0DF8FEBC,$0DFFB9F8,$0E6C6F8E,$0E78CACA,$0E7F8666,$0E8BDB9C,$0E9896D8,$0E9EEC6E
   @dcd   $0EABA7AA,$0EB7FCE6,$0EBEB87C,$0ECB6DB8,$0ED7C8EE,$0EDE7E8A,$0EEAD9C6,$0EF78EFC,$0EFDEA98,$0F6A9FCE,$0F76FB6A,$0F7DB6A6,$0F8A6BDC,$0F96C778,$0F9D7CAE,$0FA9D7EA
   @dcd   $0FB68D86,$0FBCE8BC,$0FC99DF8,$0FCFF98E,$0FDCAECA,$0FE96A66,$0FEFBF9C,$0FFC7AD8,$1668D66E,$166F8BAA,$167BE6E6,$16889C7C,$168EF7B8,$169BACEE,$16A8688A,$16AEBDC6
   @dcd   $16BB78FC,$16C7CE98,$16CE89CE,$16DADF6A,$16E79AA6,$16EDEFDC,$16FAAB78,$176766AE,$176DBBEA,$177A7786,$1786CCBC,$178D87F8,$1799DD8E,$17A698CA,$17ACEE66,$17B9A99C
   @dcd   $17BFFED8,$17CCBA6E,$17D96FAA,$17DFCAE6,$17EC867C,$17F8DBB8,$17FF96EE,$186BEC8A,$1878A7C6,$187EFCFC,$188BB898,$18986DCE,$189EC96A,$18AB7EA6,$18B7D9DC,$18BE8F78
   @dcd   $18CAEAAE,$18D79FEA,$18DDFB86,$18EAB6BC,$18F76BF8,$18FDC78E,$196A7CCA,$1976D866,$197D8D9C,$1989E8D8,$19969E6E,$199CF9AA,$19A9AEE6,$19B66A7C,$19BCBFB8,$19C97AEE
   @dcd   $19CFD68A,$19DC8BC6,$19E8E6FC,$19EF9C98,$19FBF7CE,$1A68AD6A,$1A6F68A6,$1A7BBDDC,$1A887978,$1A8ECEAE,$1A9B89EA,$1AA7DF86,$1AAE9ABC,$1ABAEFF8,$1AC7AB8E,$1ACE66CA
   @dcd   $1ADABC66,$1AE7779C,$1AEDCCD8,$1AFA886E,$1B66DDAA,$1B6D98E6,$1B79EE7C,$1B86A9B8,$1B8CFEEE,$1B99BA8A,$1BA66FC6,$1BACCAFC,$1BB98698,$1BBFDBCE,$1BCC976A,$1BD8ECA6
   @dcd   $1BDFA7DC,$1BEBFD78,$1BF8B8AE,$1BFF6DEA,$1C6BC986,$1C787EBC,$1C7ED9F8,$1C8B8F8E,$1C97EACA,$1C9EA666,$1CAAFB9C,$1CB7B6D8,$1CBE6C6E,$1CCAC7AA,$1CD77CE6
; bits 24 ? 32
   @dcd   $00000000
   @dcd   $1CDDD87C,$39BBAA98,$56997CAE,$6D76EECA,$89EEC6E6,$166CC98F,$17DAA6B7,$19A87DD8,$1B6FFAFA,$1CDDD87C,$1EABAF9D,$26798CBF,$27E769E6,$29AEE768,$2B7CBE8A,$2CEA9BAB
   @dcd   $2EB878CD,$367FEFEE,$37EDCD76,$39BBAA98,$3B8987B9,$3CF6FEDB,$3EBEDBFC,$468CB97E,$47FA96A6,$49C86DC7,$4B8FEAE9,$4CFDC86A,$4ECB9F8C,$56997CAE,$5866F9CF,$59CED6F7
   @dcd   $5B9CAE78,$5D6A8B9A,$5ED868BC,$669FDFDD,$686DBCFF,$69DB9A86,$6BA977A8,$6D76EECA,$6EDECBEB,$76ACA96D,$787A868E,$79E7FDB6,$7BAFDAD8,$7D7DB7F9,$7EEB8F7B,$86B96C9C
   @dcd   $8886E9BE,$89EEC6E6,$8BBC9E67,$8D8A7B89,$8EF7F8AA,$96BFCFCC,$988DACEE,$99FB8A6F,$9BC96797,$9D96DEB8,$9EFEBBDA



It mays remain some bugs in special case, but i'd no time to spend on this piece of code actually.
This code is very suitable for values greater than 32 bits too.

Have fun !



-- Kay

#42023 - isildur - Wed May 04, 2005 3:04 pm

I will have to test this when I find some time. Thanks a lot Kay!

#42027 - Miked0801 - Wed May 04, 2005 6:06 pm

I particularly enjoy how it ends

Code:

loop:
   b   loop


Very efficient ;)

#42031 - tepples - Wed May 04, 2005 6:55 pm

I too would prefer it as a function, with defined inputs and outputs, that I can drop into existing code.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#42124 - Kay - Fri May 06, 2005 12:14 pm

For Tepples :



Code:

   @ARM

Binary_to_BCD_conversion:
; in  R0 => binary value [24 bits max]
; out R1 <= BCD value [8 digits 0 - 99999999]

   stmfd  sp!, {r2,r3,r4,r5}

   and   r1,r0,$ff

   ldr   r3,=BCD_add_table   ; BCD lookup table

   ldr   r2,[r3,r1 lsl 2]   ; BCD (read BCD table)

   mov   r1,r0 lsr 8
   and   r1,r1,$ff
   add   r3,r3,1024
   ldr   r1,[r3,r1 lsl 2]   ; BCD (read BCD table)

   ldr   r5,=$11111110      ; carry count mask value

   add   r4,r2,r1      ; r4 = r2 + r1
   eor   r2,r1,r2      ; r2 = r1 XOR r2
   eor   r2,r4,r2      ; r2 = r4 XOR r2
   bic   r2,r5,r2      ; r2 = 0x11111110 AND NOT r2
   orr   r2,r2,r2 lsr 1      ; r2 = r2 OR (r2 LSR 1)
   sub   r1,r4,r2 lsr 2      ; r1 = r4 -  (r2 LSR 2)

   mov   r2,r0 lsr 16
   and   r2,r2,$ff
   add   r3,r3,1024
   ldr   r2,[r3,r2 lsl 2]   ; BCD (read BCD table)

   add   r4,r2,r1      ; r4 = r2 + r1
   eor   r2,r1,r2      ; r2 = r1 XOR r2
   eor   r2,r4,r2      ; r2 = r4 XOR r2
   bic   r2,r5,r2      ; r2 = 0x11111110 AND NOT r2
   orr   r2,r2,r2 lsr 1      ; r2 = r2 OR (r2 LSR 1)
   sub   r1,r4,r2 lsr 2      ; r1 = r4 -  (r2 LSR 2)

   ldmfd   sp!, {r2,r3,r4,r5}
   bx lr


BCD_add_table:

<<Please place the latest table hereafter.>>





Please give me a little credit if you use it ;)


Miked0801 pretends earlier in this post, that there's a faster way.
So what about [without consumming 16MB of lookuptable] ??? ;)
I'm very curious ...

-- Kay

#42132 - tepples - Fri May 06, 2005 2:38 pm

Thanks.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#42141 - Miked0801 - Fri May 06, 2005 6:22 pm

Quote:

Miked0801 pretends earlier in this post, that there's a faster way.
So what about [without consumming 16MB of lookuptable] ??? ;)
I'm very curious ...


Lol - my response had only to do with the looping over the swi 16 way back when. The only pretending I'm doing here is in understanding exactly how this works. Too much bit fun for me to just read the code and understand what it is you are doing. From just a techincal view though, you guys have got this routine tight. No looping, only 3 memory access, very few mov instructions, lots of free shifting, etc.

That said, I'd investigate reversing the initial 1234567 such that you don't need to "and" & shift each time you read from the register - should be able to just it shift it around as needed. Also, it feels like the move r2,r0 lsr 16 could somehow be optimized away - though how I'm not sure. Just an intuition thing. Other comments welcome :)

#42246 - FluBBa - Sun May 08, 2005 10:53 am

Miked0801 wrote:
Also, it feels like the move r2,r0 lsr 16 could somehow be optimized away - though how I'm not sure. Just an intuition thing. Other comments welcome :)


Something like this?
Code:

  and  r1,r0,$ff00
  add  r3,r3,1024
  ldr  r1,[r3,r1 lsr 6]  ; BCD (read BCD table)
............
  and  r2,r0,$ff0000
  add  r3,r3,1024
  ldr  r2,[r3,r2 lsr 14]  ; BCD (read BCD table)


Oh, and you don't have to save r2 & r3 to the stack, you can use r12 withouy a problem, r0 should also be the return register, so the last line should read:
Code:

sub r0,r4,r2,lsr#2

_________________
I probably suck, my not is a programmer.

#42323 - Miked0801 - Mon May 09, 2005 10:27 pm

Yep - something like that. In ARM, if you are using a mov instruction just for a shift, and other instructions around it aren't shifting, there's almost always a way to remove the mov. Just something to watch for.

How about the reversing the numbers thing. Anyone want to take a quick crack at that? I bet it removes at least 1 instruction off the front.

#42934 - strager - Tue May 17, 2005 12:05 am

My code:

Code:

~~~
~~~


Edit: Added the (hopefully working) integer_to_string function.
Edit: Doesn't work;; to lazy to debug.


Last edited by strager on Tue May 17, 2005 9:50 pm; edited 1 time in total

#42946 - tum_ - Tue May 17, 2005 8:15 am

strager wrote:
My code (in the works):

Code:

~~~
@
@ These might work.. Haven't tested :-)


Edit: Added the (hopefully working) integer_to_string function.


:-) Man, this is bad. It's a good idea to test before you post.
This will not even compile...

#42973 - isildur - Tue May 17, 2005 6:09 pm

Yes, most of the time when I read that it was not tested, I don't even bother trying it, unless I know the coder is very good. Please test your code.