gbadev.org forum archive

Do you have any fast code to do that?

The best way is to use the BIOS div, since you need to divide and mod by 10, so they can be done at the same time. Here's how I do it

Code:

char *itoa(int num)
{
char *str, neg;
long q, r, a; //quotient, remainder, abs(q), unused, but divfull needs it

if(num < 0)
{
neg = 1;
num = -num;
}
else
neg = 0;
str = tempStr + 256; //tempStr is a global var, so it never gets overwritten, but you could use a local string as long as you strcpy the return right away
*--str = 0; //null at the end
q = num;
while(q)
{
divfull(q, 10, &q, &r, &a);
*--str = '0' + r;
}
if(neg)
*--str = '-';
return str;
}

I just typed that from memory, so it may have problems, but you get the idea.

I will try that

I turned the code into this:

char* itos(int num) {
static char str[16];
u32 p=16, r=0, a=0, neg=0;

if(num < 0) {
neg = 1;
num = -num;
}

str[p] = 0; //Add the determinator char

do {
num = divfull(num, 10, &r, &a);
str[--p] = r+48; //Turn number into ascii value
} while(num);

if(neg == 1)
str[--p] = '-';

return str+p;
}

i had some problems with the divfull function, because swi placed the remainder into r1 and not into r2 :/

I'm now trying to get this work in by using asm code

what library do you need to use the divfull, my compiler says it doesn't know what divfull is. I understand that the GBA doesn't have a very good divide so how does the divfull work, maybe I can just implement those rutines instead.

What you need is the quotient and remainder. divfull will return both with a single divide call, so it's faster than doing one divide for the quotient and a second for the remainder. You could just use

Code:

r = q % 10;
q = q / 10;

in place of divfull, and the compiler may optimize it to a single divide, depending on how GCC's divide function works (I've always used div() type functions on GBA, so I don't know much about the library one)

Although since the original post, I've written a much faster itoa in ASM that uses reciprocal multiplication in place of the divide. It's only accurate to 16 bits though, so it won't work if you're printing really big numbers. I could do a 32-bit one, but it would have to use ARM to get the long multiplication instructions, and therefore would be much slower unless placed in IWRAM, but that would be a waste because it's not used that often.
Anyway, here it is

Code:

.global itoa
.thumb
.align 2
.thumb_func
itoa:
push {r4, r5}
mov r5, r0
bpl itoa_plus
neg r0, r0
itoa_plus:
ldr r4, =tempStr @a global array, should be 16 bytes. You could also pass the output location as a parameter if you don't like globals
add r4, #15 @start at the end of the string and work our way back
mov r1, #0
strb r1, [r4]
ldr r1, =6554

itoa_loop:
mov r3, r0
mul r3, r1 @mul by 1/10
lsr r3, r3, #16 @which was fixed-point, so shift down
lsl r2, r3, #3
add r2, r3
add r2, r3 @*8+1+1 = multiplied by 10
sub r0, r2 @which when subtracted from the original gives you the remainder
add r0, #'0'
sub r4, #1
strb r0, [r4]
mov r0, r3
bne itoa_loop

cmp r5, #0
bpl itoa_return
mov r0, #'-'
sub r4, #1
strb r0, [r4]

itoa_return:
mov r0, r4
pop {r4, r5}
bx lr

.pool

_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

You can check out my posprintf library for fast, free integer-to-string conversion using the familiar sprintf interface.

Dan.

You may try this method, wich may be the fastest for converting binary integer to BCD 4 bits (Binary Coded Decimal) ... and then to string (just add $30 to each 4 bit value extracted to get ASCII)

I personnaly use this piece of code translated to ARM assembly in my own applications.

No divide !
No reciprocal multiply !

Source fastly coded under BlitzBasic ...
It's self explainatory, no ? :)

It's works for non signed values from 0 to 65535 and is adaptable for much more ...

INT value to convert in n
result in a

Code:

n = 12345
a = 0
For i = 0 To 15
Read bcd%
If ((n Shr i) And 1) = 1 Then
   t1 = $06666666 + a
   t2 = t1 + bcd%
   t3 = t1 Xor bcd%
   t4 = t2 Xor t3
   t5 = ($ffffffff Xor t4) And $11111110
   t6 = (t5 Shr 2) Or (t5 Shr 3)
   a = (t2 - t6)
EndIf
Next

Print Hex(a)
KeyWait()

End

.table:
Data $1
Data $2
Data $4
Data $8
Data $16
Data $32
Data $64
Data $128
Data $256
Data $512
Data $1024
Data $2048
Data $4096
Data $8192
Data $16384
Data $32768
Data $65536
Data $131072
Data $262144
Data $524288
Data $1048576
Data $2097152
Data $4194304
Data $8388608
Data $16777216
Data $33554432
Data $67108864

Kay.

Kay wrote:

Source fastly coded under BlitzBasic ...
It's self explainatory, no ? :)

Not exactly self explanatory... anyone wanna rewrite this in C or asm...
_________________
www.hungrydeveloper.com
Version 2.0 now up - guaranteed at least 100% more pleasing!

*cracks knuckles*

Well, my BASIC skills aren't too rusty, I hope... but I believe in C it would translate into something like this:

Code:

static const unsigned int bcdData[] = {
0x1, 0x2, 0x4, 0x8,
0x16, 0x32, 0x64, 0x128,
0x256, 0x512, 0x1024, 0x2048,
0x4096, 0x8192, 0x16384, 0x32768,
0x65536, 0x131072, 0x262144, 0x524288,
0x1048576, 0x2097152, 0x4194304, 0x8388608,
0x16777216, 0x33554432, 0x67108864
};

unsigned int Bin2BCD(unsigned int n)
{
unsigned int a, bcd, i, t1, t2, t3, t4, t5, t6;

a = 0;

for (i = 0; i < 16; i++)
{
   bcd = bcdData[i];
   if ((n >> i) & 1)
   {
      t1 = 0x06666666 + a;
      t2 = t1 + bcd;
      t3 = t1 ^ bcd;
      t4 = t2 ^ t3;
      t5 = (0xFFFFFFFF ^ t4) & 0x11111110;
      t6 = (t5 >> 2) | (t5 >> 3);
      a = t2 - t6;
   }
}

return a;
}

It's an interesting-looking routine, although I haven't tested it or anything. And I'm not sure why the last 11 elements of the array are required, since they don't seem to get referenced at all by the routine.

I'm still pretty sure my posprintf is faster, though. ;)

Dan.

Please don't use the BIOS divide - its very slow !. Just use the / operator. The divide is by a constant, so the compiler should optimise them away (in ARM mode devkitAdv gcc uses reciprocal multiplication)

Code:

void u32toa(u32 value, char* output) {

char ch[12]; //enough for max numChars required
u32 v = value;
u32 oldv = 0;
s32 chIdx = 0;
do{
   oldv = v;
   v /= 10; //compiler should optimise divide by constant - best to check asm output
   ch[chIdx] = (oldv - (v*10) + 48); //48 is start of numbers in ASCII
   ++chIdx;
}while(v);
//we have our decimal digits but in reverse order... so re-order them
s32 outIdx = 0;
do{
   --chIdx;
   output[outIdx] = ch[chIdx];
   ++outIdx;
}while(chIdx > 0);
output[outIdx] = 0; //terminating zero
return;
}

char* output better be big enough to hold max digits possible...
If you want to use signed integers, you will need a little extra pre and post processing

cheers

Col

Here's the ARM assembly version of my INT TO BCD convert code, for those who having troubles to translate it into assembly (very simple thing).

It doesn't look very educational to give any code like this one directly.
Best way IMHO is understand things by yourself by giving formulas, methodologies or exemples ... and of course, coding them.

Feel free to compare overall performances with other pieces of code.
Thanks for any feedback ...

Code:

Binary_to_BCD_conversion:
; in R0 => binary value [24 bits max]
; out R1 <= BCD value [8 digits 0 - 99999999]

mov r1,0             ; a
mov r2,24
ldr r3,=BCD_conversion_table    ; BCD
ldr r4,=$06666666          ; carry care overflow value
ldr r5,=$11111110          ; carry count mask value
mov r6,1
_Binary_to_BCD_conversion_loop:
ldr r7,[r3],4!          ; BCD (read BCD table)
tst r0,r6 lsl r2          ; test bit
beq _Binary_to_BCD_conversion_loop_count
add r8,r4,r1          ; t1 = $06666666 + a
add r9,r8,r7          ; t2 = t1 + BCD
eor r10,r8,r7          ; t3 = t1 XOR BCD
eor r11,r9,r10          ; t4 = t2 XOR t3
mvn r10,0             ; t3 = $FFFFFFFF
eor r10,r10,r11          ; t3 = t3 XOR t4
and r10,r10,r5          ; t3 = t3 AND $11111110
mov r8,r10 lsr 2          ; t1 = t3 LSR 2
orr r8,r8,r10 lsr 3          ; t1 = t1 OR t3 LSR 3
sub r1,r9,r8          ; a = t2 - t1
_Binary_to_BCD_conversion_loop_count:
subs r2,r2,1
bge _Binary_to_BCD_conversion_loop

mov pc,lr

BCD_conversion_table:
@DCD    $16777216,$08388608,$04194304,$02097152,$01048576,$00524288,$00262144,$00131072
@DCD    $00065536,$00032768,$00016384,$00008192,$00004096,$00002048,$00001024,$00000512
@DCD    $00000256,$00000128,$00000064,$00000032,$00000016,$00000008,$00000004,$00000002
@DCD    $00000001

-- Kay

http://cvs.sourceforge.net/viewcvs.py/demi/stdgba/src/stdgba_stdlib.S?rev=1.1&view=auto
_________________
death scream...

Ugh... it loops over an SWI6... Dan no like...

Dan.

:) - There are faster ways indeed

I know it's an old thread but I needed a routine to convert a hex number to BCD in ARM asm, so I took Kay's code above and modified it so it could compile in devkitARM.

For those interested, here it is:

Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
@ u32 Bin2BCD(u32 hexVal);
@
@ in r0 => binary value [24 bits max]
@ out r0 <= BCD value [8 digits 0 - 99999999]

Bin2BCD:
stmfd sp!, {r4, r5, r6, r7, r8, r9, r10, r11}

mov r1, #0
mov r2, #24
ldr r3, =BCD_conversion_table    @ BCD
ldr r4, =0x06666666          @ carry care overflow value
ldr r5, =0x11111110          @ carry count mask value
mov r6, #1

_Binary_to_BCD_conversion_loop:
ldr r7, [r3], #4             @ BCD (read BCD table)
tst r0, r6, lsl r2             @ test bit

beq _Binary_to_BCD_conversion_loop_count
add r8, r4, r1             @ t1 = $06666666 + a
add r9, r8, r7             @ t2 = t1 + BCD
eor r10, r8, r7          @ t3 = t1 XOR BCD
eor r11, r9, r10          @ t4 = t2 XOR t3
mvn r10, #0             @ t3 = $FFFFFFFF
eor r10, r10, r11          @ t3 = t3 XOR t4
and r10, r10, r5          @ t3 = t3 AND $11111110
mov r8, r10, lsr #2       @ t1 = t3 LSR 2
orr r8, r8, r10, lsr #3 @ t1 = t1 OR t3 LSR 3
sub r1, r9, r8             @ a = t2 - t1

_Binary_to_BCD_conversion_loop_count:
subs r2, r2, #1
bge _Binary_to_BCD_conversion_loop

mov r0, r1

ldmfd sp!, {r4, r5, r6, r7, r8, r9, r10, r11}
bx lr

.pool

BCD_conversion_table:
.word 0x16777216
.word 0x08388608
.word 0x04194304
.word 0x02097152
.word 0x01048576
.word 0x00524288
.word 0x00262144
.word 0x00131072
.word 0x00065536
.word 0x00032768
.word 0x00016384
.word 0x00008192
.word 0x00004096
.word 0x00002048
.word 0x00001024
.word 0x00000512
.word 0x00000256
.word 0x00000128
.word 0x00000064
.word 0x00000032
.word 0x00000016
.word 0x00000008
.word 0x00000004
.word 0x00000002
.word 0x00000001

By the way, if someone has a faster routine for this, I'd be interested. ;)

Without actually paying much attention to the actuall algorithm...
What's the difference between these two?

Code:

mvn r10, #0 @ t3 = $FFFFFFFF
eor r10, r10, r11 @ t3 = t3 XOR t4
----------------------------------------
mvn r10, r11 @ t3 = ~t4

And maybe we can do like this...

Code:

mov r8, r10, lsr #2 @ t1 = t3 LSR 2
orr r8, r8, r10, lsr #3 @ t1 = t1 OR t3 LSR 3
sub r1, r9, r8 @ a = t2 - t1
----------------------------------------
orr r8, r8, r10, lsr #1 @ t1 = t1 OR t3 LSR 1
sub r1, r9, r8,lsr#2 @ a = t2 - t1 LSR 2

_________________
I probably suck, my not is a programmer.

FluBBa wrote:

Without actually paying much attention to the actuall algorithm...
What's the difference between these two?

Code:

mvn r10, #0 @ t3 = $FFFFFFFF
eor r10, r10, r11 @ t3 = t3 XOR t4
----------------------------------------
mvn r10, r11 @ t3 = ~t4

And maybe we can do like this...

Code:

mov r8, r10, lsr #2 @ t1 = t3 LSR 2
orr r8, r8, r10, lsr #3 @ t1 = t1 OR t3 LSR 3
sub r1, r9, r8 @ a = t2 - t1
----------------------------------------
orr r8, r8, r10, lsr #1 @ t1 = t1 OR t3 LSR 1
sub r1, r9, r8,lsr#2 @ a = t2 - t1 LSR 2

Code:

mvn r10, r11 @ t3 = ~t4

Yep this works, but not the second optimization.

Sorry, the last one should be

Code:

orr r8, r10, r10, lsr #1 @ t1 = t1 OR t3 LSR 1
sub r1, r9, r8,lsr#2 @ a = t2 - t1 LSR 2

_________________
I probably suck, my not is a programmer.

Cool, now it works :).

A little recoding to save some registers...

Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
@ u32 Bin2BCD(u32 hexVal);
@
@ in r0 => binary value [24 bits max]
@ out r0 <= BCD value [8 digits 0 - 99999999]

Bin2BCD:
stmfd sp!, {r4, r5, r6, r7, r8}

mov r1, #0
ldr r3, =BCD_conversion_table @ BCD
ldr r4, =0x06666666 @ carry care overflow value
ldr r5, =0x11111110 @ carry count mask value

_Binary_to_BCD_conversion_loop:
ldr r7, [r3], #4 @ BCD (read BCD table)
movs r0, r0, lsr #1 @ test bit

bcc _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
add r8, r4, r1 @ t1 = $06666666 + a
add r6, r8, r7 @ t2 = t1 + BCD
eor r2, r8, r7 @ t3 = t1 XOR BCD
eor r2, r6, r2 @ t3 = t2 XOR t3
mvn r2, r2 @ t3 = ~t3
and r2, r2, r5 @ t3 = t3 AND $11111110
orr r8, r2, r2, lsr #1 @ t1 = t3 OR t3 LSR 1
sub r1, r6, r8, lsr #2 @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check
bne _Binary_to_BCD_conversion_loop
mov r0, r1

ldmfd sp!, {r4, r5, r6, r7, r8}
bx lr

.pool

BCD_conversion_table:
.word 0x00000001
.word 0x00000002
.word 0x00000004
.word 0x00000008
.word 0x00000016
.word 0x00000032
.word 0x00000064
.word 0x00000128
.word 0x00000256
.word 0x00000512
.word 0x00001024
.word 0x00002048
.word 0x00004096
.word 0x00008192
.word 0x00016384
.word 0x00032768
.word 0x00065536
.word 0x00131072
.word 0x00262144
.word 0x00524288
.word 0x01048576
.word 0x02097152
.word 0x04194304
.word 0x08388608
.word 0x16777216

I think that should work
_________________
I probably suck, my not is a programmer.

Great job! One thing though, the label _Binary_to_BCD_conversion_loop_check is missing a colon :)

You know, I'm thinking of creating a repository of fast ARM utility routines for the GBA. Instead of having to search this forum, common fast routines could be found at one web page. Coders would be challenged to improve and optimize those routines and contribute new ones, always in the quest to make the fastest code.

I will prepare something...

Please. That's the kind of thing I could sink my teeth into :)

That would be great. I'm still learning ARM assembly and I find that reading through code and thinking about optimizing it is the easiest way to improve my knowledge of the instruction set.

I've been thinking about doing the same thing for some time now, I you start it up I'm sure to contribute.
I have a lot of routines in my emulators, the latest is a Gamma corrector which is quite good and not so specialized. Takes a byte (0x00-0xFF) and gamma value and spits out 5bit (0x00-0x1F) gamma corrected value.
We should probably add the pixel plotter and line drawing routines as well.
Sqrt, div, lots of stuff to add.. =)
_________________
I probably suck, my not is a programmer.

Hey that's great, I'll PM you soon and we will put something up.

Sounds like a fantastic idea guys ! keep us posted :)

[mRg]

Well, for a start, here's my division implementation for Demi (the lib i'm working on; sf.net cvs is terribly outdated & abondoned, still waiting for approval at savannah):

Code:

@ This program is free software; you can redistribute it and/or modify
@ it under the terms of the GNU General Public License as published by
@ the Free Software Foundation; either version 2 of the License, or
@ (at your option) any later version.

.align
.arm

div:
#ifdef DIV_ACTS_AS_SWI6 @ preserve r2 & r12, r3=|a/b], r1=a%b
stmfd sp!, {r0-r2, r12}
#endif

eor r12, r0, r1 @ sgn(12) == sgn(r0*r1)
movs r2, r0, lsl #1 @ r0 = abs(r0)
rsbcs r0, r0, #0
movs r2, r1, lsl #1 @ r1 = abs(r1)
rsbcs r1, r1, #0
mov r2, #1 @ bit_c
0:
cmp r0, r1
bls 1f
mov r1, r1, lsl #1
movs r2, r2, lsl #1
bne 0b
1:

mov r3, #0 @ r3 = |res|
tst r2, r2
beq 3f

2:
cmp r1, r0
subls r0, r0, r1
orrls r3, r3, r2
mov r1, r1, lsr #1
movs r2, r2, lsr #1
bne 2b
3:
movs r12, r12, lsl #1
#ifdef DIV_ACTS_AS_SWI6
ldmfd sp!, {r0-r2, r12}
mul r1, r3, r1
rsbcs r1, r1, #0
sub r1, r0, r1
#endif

#ifdef DIV_UNSIGNED
mov r0, r3
#else
movcc r0, r3
rsbcs r0, r3, #0
#endif

#if defined(DIV_ACTS_AS_SWI6) && !defined(DIV_UNSIGNED) && defined(DIV_MOD_ABS) @ r1 = |a%b|
cmp r1, #0
rsblt r1, r1, #0
#endif

bx lr

It's unrolled, for sake of writing small code, assuming that this code will be placed in iwram.
fastest & shortest version will be when defined DIV_UNSIGNED && !defined DIV_ACTS_AS_SWI6 (the loops remain same, therefore it won't be a serious difference). thumb version would be much slower, so i didn't bother.

Code:

udiv:
mov r2, #1 @ bit_c
0:
cmp r0, r1
bls 1f
mov r1, r1, lsl #1
movs r2, r2, lsl #1
bne 0b

1:
mov r3, #0 @ r3 = |res|
tst r2, r2
beq 3f

2:
cmp r1, r0
subls r0, r0, r1
orrls r3, r3, r2
mov r1, r1, lsr #1
movs r2, r2, lsr #1
bne 2b
3:
mov r0, r3

bx lr

BTW, disassembling DarkFader's bios dump (in CowBite & VBA) output the following swi6 & swi7 (includes unnecessary stuff like swapping r0 and r1, an unused cmp r0,r2 after label 1, instead of 3 hi tests, bls could be used; also seems to be a erroneous one, never gave a correct result, didn't get what actually's intended to be done either...)

Code:

swi6:
mov r3, r0
mov r0, r1
mov r1, r3
ands r3, r1, #(1<<31)
rsbmi r1, r1, #0
eors r12, r3, r0, asr #20 @ CowBite didn't output an asr #20
rsbcs r0, r0, #0
movs r2, r1
0:
cmps r2, r0, lsr #1
movls r2, r2, lsl #1
bcc 0b
1:
cmps r0, r2
adc r3, r3, r3
subcs r0, r0, r2
teqs r2, r1
movne r2, r2, lsr #1
bne 1b

mov r1, r0
mov r0, r3
movs r12, r12, lsl #1
rsbcs r0, r0, #0
rsbmi r1, r1, #0
bx lr

swi7:
stmfd sp!, {r4}
mov r12, r0
mov r1, #1

0:
cmp r0, r1
movhi r0, r0, lsr #1
movhi r1, r1, lsl #1
bhi 0b
1:
mov r0, r12
mov r4, r1
mov r3, #0
mov r2, r1
2:
cmps r0, r2
adc r3, r3, r3
subcs r0, r0, r2
teqs r2, r1
movne r2, r2, lsr #1
bne 2b

add r1, r1, r3
movs r1, r1, lsr #1
cmp r1, r4
bcc 1b
mov r0, r4
ldmfd sp!, {r4}
bx lr

Anyone got working the bios dump? Or have a working dump?
_________________
death scream...

Thanks, I will add it to the upcoming repository. But, is this faster than using swi 0x60000 ?

I guess so, when placed in iwram.
However, since i haven't seen the swi6 source (not talking about the non-working dump), nor have the actual hardware (+ emulator timing isn't perfect), i can only guess.
_________________
death scream...

depending on what you are using the output for, you could optimise differently. For example, in one game I was working on, I just wanted a 5 digit, 0 filled number. So, since I knew that my numbers were always going to be right aligned, I could just fill in the output string directly.

Here is the code I was using (I'm using a / and a % -- probably should modify it to only use the / like the example above, but for demonstration purposes this works):

Code:

for(int i = 4; i >= 0; i--) {
buffer[i] = '0' + aNumber % 10;
aNumber /= 10;
}

Basically, the number is passed in via aNumber and the result is returned in buffer. In my routine, I use a static buffer, since I'm using the result right away -- would be really easy to modify to use a passed in buffer, but I didn't feel like it for my proggy.
_________________
dennis

Here is my IntToStr for the x86. D$ means DWORD PTR; and B$ means BYTE PTR. If num is more than one digit, then what is required to extract the last digit is to work out num without the last digit, and then subtract that from num.

Had we been doing Int to Hex String we would simply AND by 0xf0, but we are working in decimal, (num / 10) * 10 would do us fine.

At the end of the End_If statement, eax = num. EDI is updated so that after you make your recursive call the position is updated. For num=100, two recursive calls are made before the original call displays. And EAX will be multiplied by 10 ( note that before a recursive call is made that EAX is divided by 10 ).

Code:

; int intToStr ( num, str )
; eax = number to convert
; edi = address of string
;
; PURPOSE:
; convert integer, num to a String at address str
;
; RETURNS:
; eax = eax * 10
; edi = edi + 1
intToStr:

If eax < 10
add eax '0' ; eax = num + '0'
mov B$edi al
sub eax '0' ; eax = num

Else
push eax ; store num

xor edx edx
mov ebx 10
div ebx ; eax = num / 10

call intToStr
; eax = (num / 10) * 10
mov ebx eax

pop eax ; restore num
push eax ; store num

sub eax ebx ; eax = num, ebx = (num / 10) * 10

add eax '0' ; copy number to string
mov B$edi al

pop eax

End_If

mov ebx 10 ; eax = eax * 10
mul ebx ; .

inc edi

ret

For those who do not know X86 assembly, here is what you're missing. First of all If, End_If are macros. AL is the low byte of EAX, which is 32 bit. EBX, EAX and EDI are all 32-bit registers. EDX is cleared before dividing EAX by EBX because EDX is used by the division as the high DWORD.

some basic stuff...

Code:

@ void* plot_modeX(void* vram, u8 x, u8 y, u16 color)
@ return: address of the destination pixel

.align
.arm

plot_mode3:
add r1, r2, lsl #8
sub r1, r2, lsl #4
add r0, r1, lsl #1
strh r3, [r0]
bx lr

.align
.arm

@ r3 is assumed to be comibination of two adjacent pixels
plot_mode4:
add r0, r2, lsl #8
sub r0, r2, lsl #4
strh r3, [r0]
bx lr

.align
.arm

plot_mode5:
add r1, r2, lsl #7
add r1, r2, lsl #5
add r0, r1, lsl #1
strh r3, [r0]
bx lr

@ void/void* flip()
@ return: nothing/vram

.align
.arm

@ doesn't look like the fastest it could be...
flip:
ldr r1, =0x4000000 @ REG_DISPCNT
ldrh r2, [r1]
ldr r3, 0x10
eor r2, r2, r3
#ifdef FLIP_RETURNS_VRAM
tst r2, r3
ldr r0, =0x6000000 @ MEM_VRAM
ldrne r3, =0xa000 @ PAGE_SIZE
addne r0, r0, r3
strh r2, [r1]
#endif
bx lr

i've just written them, ie i suggest you to test them before adding.
_________________
death scream...

Last edited by f(DarkAngel) on Thu Sep 16, 2004 6:35 pm; edited 1 time in total

Mode3 plotter (quote DekuTree64)
So you don't have to supply the VRAM base address (as you don't have any pages).

Code:

@r0 = x
@r1 = y
@r2 = color
.global PlotPixel
.arm
.align 2
.section .iwram, "ax", %progbits
PlotPixel:
rsb r1, r1, r1, LSL #4 @r1 = y*15
add r1, r0, r1, LSL #5 @r1 = x+y*15*32 = x+y*480
add r0, r0, #0x6000000 @r0 = x+VRAM base
strh r2, [r0, r1] @this comes out to VRAM base+x*2+y*480
bx lr

Flip code...

Code:

flip:
mov r1, #0x4000000 @ REG_DISPCNT
ldrh r2, [r1]
eor r2, r2, #0x10
#ifdef FLIP_RETURNS_VRAM
tst r2, #0x10
mov r0, #0x6000000 @ MEM_VRAM
addne r0, #0xA000 @ PAGE_SIZE
#endif
strh r2, [r1]
bx lr

_________________
I probably suck, my not is a programmer.

In much cases, my previous code sample may be suffisant (for scores, bullets count, life, ...), but in some time critical application you may want it running much more faster.

If someone stills interested, there's a faster way at the cost of only 3~4 KBytes of lookup tables and only 2 BCD additions running at constant machine time for values from 0 to 99.999.999

32 bits lookup tables values are generated this way (for 8 digits):

Code:

for i = 0 to 255
val_00_07(i) = BCD(i)
val_08_15(i) = BCD(i shl 8)
val_16_23(i) = BCD(i shl 16)
next

and then your INT converted to BCD like that:

Code:

v1 = a AND $FF
v2 = (a SHR 8) AND $FF
v3 = (a SHR 16) AND $FF

a = BCD_ADD( val_00_07(v1) , val_08_15(v2) )
a = BCD_ADD( a , val_16_23(v3) )

Don't ask me for source code, i only give clues in order you learn how this works ;)

-- Kay

Hi guys, sorry for reviving this old thread that I've just come cross.
It looks to me that this:

FluBBa wrote:

A little recoding to save some registers...

Code:

[snipped]

mvn r2, r2 @ t3 = ~t3
and r2, r2, r5 @ t3 = t3 AND $11111110

[snipped]

can be replaced by this:

Code:

bic r2, r5, r2 @ t3 = ~t3 AND $11111110

i'm not a gba programmer, so correct me if i'm wrong but i would use r12 instead of r8 in your function because according to APCS you don't need to save/restore it.

You don't need r7 either, as r2 can be re-used:

FluBBa wrote:

A little recoding to save some registers...

Code:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@
@ u32 Bin2BCD(u32 hexVal);
@
[snip]
_Binary_to_BCD_conversion_loop:
ldr r7, [r3], #4 @ BCD (read BCD table)
movs r0, r0, lsr #1 @ test bit

bcc _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
add r8, r4, r1 @ t1 = $06666666 + a
add r6, r8, r7 @ t2 = t1 + BCD
eor r2, r8, r7 @ t3 = t1 XOR BCD
eor r2, r6, r2 @ t3 = t2 XOR t3
mvn r2, r2 @ t3 = ~t3
and r2, r2, r5 @ t3 = t3 AND $11111110
orr r8, r2, r2, lsr #1 @ t1 = t3 OR t3 LSR 1
sub r1, r6, r8, lsr #2 @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check
[snip]

Becomes:

Code:

_Binary_to_BCD_conversion_loop:
ldr r2, [r3], #4 @ BCD (read BCD table)
movs r0, r0, lsr #1 @ test bit

bcc _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
add r8, r4, r1 @ t1 = $06666666 + a
add r6, r8, r2 @ t2 = t1 + BCD
@ here we re-use r2 for t3 as we don't need BCD anymore:
eor r2, r8, r2 @ t3 = t1 XOR BCD
eor r2, r6, r2 @ t3 = t2 XOR t3
bic r2, r5, r2 @ t3 = ~t3 AND $11111110
orr r8, r2, r2, lsr #1 @ t1 = t3 OR t3 LSR 1
sub r1, r6, r8, lsr #2 @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check

Not tested.

Nice, some more register removal...

Code:

_Binary_to_BCD_conversion_loop:
ldr r2, [r3], #4 @ BCD (read BCD table)
movs r0, r0, lsr #1 @ test bit

bcc _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
add r1, r4, r1 @ t1 = $06666666 + a
add r12, r1, r2 @ t2 = t1 + BCD
@ here we re-use r2 for t3 as we don't need BCD anymore:
eor r2, r1, r2 @ t3 = t1 XOR BCD
eor r2, r12, r2 @ t3 = t2 XOR t3
bic r2, r5, r2 @ t3 = ~t3 AND $11111110
orr r1, r2, r2, lsr #1 @ t1 = t3 OR t3 LSR 1
sub r1, r12, r1, lsr #2 @ a = t2 - (t1 LSR 2)

_Binary_to_BCD_conversion_loop_check

_________________
I probably suck, my not is a programmer.

This optimization is now updated on the
GBA ARM Code Repository.

mod edit: And the link is now fixed.

Why did you bump that old post? ;)

It wasn't really about BCD conversion and i think BCD can't be used in real projects because it of course would be hard for the average user to read the numbers =)

Well, i figured that for real int->string converting of smaller numbers (16 bits) a table might be the best (just counting how often i can subtract 1000,100,10... and so on). I used this on Pokemon Mini too and in most cases you won't need to do it very often anyways (maybe a few times per frame).
_________________
Team Pokeme
My blog and PM ASM tutorials

I optimized this further by adding 0x06666666 to all the entries in the table...

Code:

Bin2BCD:
stmfd sp!, {r4,r5}

mov r1, #0
ldr r3, =BCD_conversion_table @ BCD
ldr r5, =0x11111110 @ carry count mask value

_Binary_to_BCD_conversion_loop:
ldr r2, [r3], #4 @ BCD (read BCD table)
movs r0, r0, lsr #1 @ test bit

bcc _Binary_to_BCD_conversion_loop_check
_Binary_to_BCD_calc:
add r4, r2, r1 @ r4 = r2 + r1
eor r2, r1, r2 @ r2 = r1 XOR r2
eor r2, r4, r2 @ r2 = r4 XOR r2
bic r2, r5, r2 @ r2 = 0x11111110 AND NOT r2
orr r2, r2, r2, lsr #1 @ r2 = r2 OR (r2 LSR 1)
sub r1, r4, r2, lsr #2 @ r1 = r4 - (r2 LSR 2)

_Binary_to_BCD_conversion_loop_check:
bne _Binary_to_BCD_conversion_loop
mov r0, r1

ldmfd sp!, {r4,r5}
bx lr

.pool

BCD_conversion_table:
.word 0x06666667
.word 0x06666668
.word 0x0666666a
.word 0x0666666e
.word 0x0666667c
.word 0x06666698
.word 0x066666ca
.word 0x0666678e
.word 0x066668bc
.word 0x06666b78
.word 0x0666768a
.word 0x066686ae
.word 0x0666a6fc
.word 0x0666e7f8
.word 0x0667c9ea
.word 0x06698dce
.word 0x066cbb9c
.word 0x067976d8
.word 0x068c87aa
.word 0x06b8a8ee
.word 0x076aebdc
.word 0x086fd7b8
.word 0x0a7fa96a
.word 0x0e9eec6e
.word 0x1cddd87c
.word 0x39bbaa98
.word 0x6d76eeca

Lupin wrote:

Why did you bump that old post? ;)

It is always better to post in an existing thread than to start a new one (Flash).
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Cool, I will test it and add it to the
GBA ARM Code Repository.

Gator: very nice work.
Just swap r5 to r12 and you save one of the stack accesses also.
_________________
I probably suck, my not is a programmer.

As i told it previously under this topic, there's a faster way using a 3KB BCD lookup table.
This will remove the inner loop, improve overall performances, and fix machine time (worst & best cases took the same machine time amount).
Please take a look: (GoldRoad ASM format, using some optimisations given by Flubba and Isildur, thanx to them)

Code:

@ARM

ldr r0,=1234567
and r1,r0,$ff

ldr r3,=BCD_add_table ; BCD lookup table

ldr r2,[r3,r1 lsl 2] ; BCD (read BCD table)

mov r1,r0 lsr 8
and r1,r1,$ff
add r3,r3,1024
ldr r1,[r3,r1 lsl 2] ; BCD (read BCD table)

ldr r5,=$11111110    ; carry count mask value

add r4,r2,r1    ; r4 = r2 + r1
eor r2,r1,r2    ; r2 = r1 XOR r2
eor r2,r4,r2    ; r2 = r4 XOR r2
bic r2,r5,r2    ; r2 = 0x11111110 AND NOT r2
orr r2,r2,r2 lsr 1    ; r2 = r2 OR (r2 LSR 1)
sub r1,r4,r2 lsr 2    ; r1 = r4 - (r2 LSR 2)

mov r2,r0 lsr 16
and r2,r2,$ff
add r3,r3,1024
ldr r2,[r3,r2 lsl 2] ; BCD (read BCD table)

add r4,r2,r1    ; r4 = r2 + r1
eor r2,r1,r2    ; r2 = r1 XOR r2
eor r2,r4,r2    ; r2 = r4 XOR r2
bic r2,r5,r2    ; r2 = 0x11111110 AND NOT r2
orr r2,r2,r2 lsr 1    ; r2 = r2 OR (r2 LSR 1)
sub r1,r4,r2 lsr 2    ; r1 = r4 - (r2 LSR 2)

loop:
b loop

BCD_add_table:
; bits 0 ? 7
@dcd $00000000
@dcd $06666667,$06666668,$06666669,$0666666A,$0666666B,$0666666C,$0666666D,$0666666E,$0666666F,$06666676,$06666677,$06666678,$06666679,$0666667A,$0666667B,$0666667C
@dcd $0666667D,$0666667E,$0666667F,$06666686,$06666687,$06666688,$06666689,$0666668A,$0666668B,$0666668C,$0666668D,$0666668E,$0666668F,$06666696,$06666697,$06666698
@dcd $06666699,$0666669A,$0666669B,$0666669C,$0666669D,$0666669E,$0666669F,$066666A6,$066666A7,$066666A8,$066666A9,$066666AA,$066666AB,$066666AC,$066666AD,$066666AE
@dcd $066666AF,$066666B6,$066666B7,$066666B8,$066666B9,$066666BA,$066666BB,$066666BC,$066666BD,$066666BE,$066666BF,$066666C6,$066666C7,$066666C8,$066666C9,$066666CA
@dcd $066666CB,$066666CC,$066666CD,$066666CE,$066666CF,$066666D6,$066666D7,$066666D8,$066666D9,$066666DA,$066666DB,$066666DC,$066666DD,$066666DE,$066666DF,$066666E6
@dcd $066666E7,$066666E8,$066666E9,$066666EA,$066666EB,$066666EC,$066666ED,$066666EE,$066666EF,$066666F6,$066666F7,$066666F8,$066666F9,$066666FA,$066666FB,$066666FC
@dcd $066666FD,$066666FE,$066666FF,$06666766,$06666767,$06666768,$06666769,$0666676A,$0666676B,$0666676C,$0666676D,$0666676E,$0666676F,$06666776,$06666777,$06666778
@dcd $06666779,$0666677A,$0666677B,$0666677C,$0666677D,$0666677E,$0666677F,$06666786,$06666787,$06666788,$06666789,$0666678A,$0666678B,$0666678C,$0666678D,$0666678E
@dcd $0666678F,$06666796,$06666797,$06666798,$06666799,$0666679A,$0666679B,$0666679C,$0666679D,$0666679E,$0666679F,$066667A6,$066667A7,$066667A8,$066667A9,$066667AA
@dcd $066667AB,$066667AC,$066667AD,$066667AE,$066667AF,$066667B6,$066667B7,$066667B8,$066667B9,$066667BA,$066667BB,$066667BC,$066667BD,$066667BE,$066667BF,$066667C6
@dcd $066667C7,$066667C8,$066667C9,$066667CA,$066667CB,$066667CC,$066667CD,$066667CE,$066667CF,$066667D6,$066667D7,$066667D8,$066667D9,$066667DA,$066667DB,$066667DC
@dcd $066667DD,$066667DE,$066667DF,$066667E6,$066667E7,$066667E8,$066667E9,$066667EA,$066667EB,$066667EC,$066667ED,$066667EE,$066667EF,$066667F6,$066667F7,$066667F8
@dcd $066667F9,$066667FA,$066667FB,$066667FC,$066667FD,$066667FE,$066667FF,$06666866,$06666867,$06666868,$06666869,$0666686A,$0666686B,$0666686C,$0666686D,$0666686E
@dcd $0666686F,$06666876,$06666877,$06666878,$06666879,$0666687A,$0666687B,$0666687C,$0666687D,$0666687E,$0666687F,$06666886,$06666887,$06666888,$06666889,$0666688A
@dcd $0666688B,$0666688C,$0666688D,$0666688E,$0666688F,$06666896,$06666897,$06666898,$06666899,$0666689A,$0666689B,$0666689C,$0666689D,$0666689E,$0666689F,$066668A6
@dcd $066668A7,$066668A8,$066668A9,$066668AA,$066668AB,$066668AC,$066668AD,$066668AE,$066668AF,$066668B6,$066668B7,$066668B8,$066668B9,$066668BA,$066668BB
;bits 8 ? 15:
@dcd $00000000
@dcd $066668BC,$06666B78,$06666DCE,$0666768A,$066678E6,$06667B9C,$06667DF8,$066686AE,$0666896A,$06668BC6,$06668E7C,$066696D8,$0666998E,$06669BEA,$06669EA6,$0666A6FC
@dcd $0666A9B8,$0666AC6E,$0666AECA,$0666B786,$0666B9DC,$0666BC98,$0666BEEE,$0666C7AA,$0666CA66,$0666CCBC,$0666CF78,$0666D7CE,$0666DA8A,$0666DCE6,$0666DF9C,$0666E7F8
@dcd $0666EAAE,$0666ED6A,$0666EFC6,$0666F87C,$0666FAD8,$0666FD8E,$0666FFEA,$066768A6,$06676AFC,$06676DB8,$0667766E,$066778CA,$06677B86,$06677DDC,$06678698,$066788EE
@dcd $06678BAA,$06678E66,$066796BC,$06679978,$06679BCE,$06679E8A,$0667A6E6,$0667A99C,$0667ABF8,$0667AEAE,$0667B76A,$0667B9C6,$0667BC7C,$0667BED8,$0667C78E,$0667C9EA
@dcd $0667CCA6,$0667CEFC,$0667D7B8,$0667DA6E,$0667DCCA,$0667DF86,$0667E7DC,$0667EA98,$0667ECEE,$0667EFAA,$0667F866,$0667FABC,$0667FD78,$0667FFCE,$0668688A,$06686AE6
@dcd $06686D9C,$06686FF8,$066878AE,$06687B6A,$06687DC6,$0668867C,$066888D8,$06688B8E,$06688DEA,$066896A6,$066898FC,$06689BB8,$06689E6E,$0668A6CA,$0668A986,$0668ABDC
@dcd $0668AE98,$0668B6EE,$0668B9AA,$0668BC66,$0668BEBC,$0668C778,$0668C9CE,$0668CC8A,$0668CEE6,$0668D79C,$0668D9F8,$0668DCAE,$0668DF6A,$0668E7C6,$0668EA7C,$0668ECD8
@dcd $0668EF8E,$0668F7EA,$0668FAA6,$0668FCFC,$0668FFB8,$0669686E,$06696ACA,$06696D86,$06696FDC,$06697898,$06697AEE,$06697DAA,$06698666,$066988BC,$06698B78,$06698DCE
@dcd $0669968A,$066998E6,$06699B9C,$06699DF8,$0669A6AE,$0669A96A,$0669ABC6,$0669AE7C,$0669B6D8,$0669B98E,$0669BBEA,$0669BEA6,$0669C6FC,$0669C9B8,$0669CC6E,$0669CECA
@dcd $0669D786,$0669D9DC,$0669DC98,$0669DEEE,$0669E7AA,$0669EA66,$0669ECBC,$0669EF78,$0669F7CE,$0669FA8A,$0669FCE6,$0669FF9C,$066A67F8,$066A6AAE,$066A6D6A,$066A6FC6
@dcd $066A787C,$066A7AD8,$066A7D8E,$066A7FEA,$066A88A6,$066A8AFC,$066A8DB8,$066A966E,$066A98CA,$066A9B86,$066A9DDC,$066AA698,$066AA8EE,$066AABAA,$066AAE66,$066AB6BC
@dcd $066AB978,$066ABBCE,$066ABE8A,$066AC6E6,$066AC99C,$066ACBF8,$066ACEAE,$066AD76A,$066AD9C6,$066ADC7C,$066ADED8,$066AE78E,$066AE9EA,$066AECA6,$066AEEFC,$066AF7B8
@dcd $066AFA6E,$066AFCCA,$066AFF86,$066B67DC,$066B6A98,$066B6CEE,$066B6FAA,$066B7866,$066B7ABC,$066B7D78,$066B7FCE,$066B888A,$066B8AE6,$066B8D9C,$066B8FF8,$066B98AE
@dcd $066B9B6A,$066B9DC6,$066BA67C,$066BA8D8,$066BAB8E,$066BADEA,$066BB6A6,$066BB8FC,$066BBBB8,$066BBE6E,$066BC6CA,$066BC986,$066BCBDC,$066BCE98,$066BD6EE,$066BD9AA
@dcd $066BDC66,$066BDEBC,$066BE778,$066BE9CE,$066BEC8A,$066BEEE6,$066BF79C,$066BF9F8,$066BFCAE,$066BFF6A,$066C67C6,$066C6A7C,$066C6CD8,$066C6F8E,$066C77EA,$066C7AA6
@dcd $066C7CFC,$066C7FB8,$066C886E,$066C8ACA,$066C8D86,$066C8FDC,$066C9898,$066C9AEE,$066C9DAA,$066CA666,$066CA8BC,$066CAB78,$066CADCE,$066CB68A,$066CB8E6
; bits 16 ? 23
@dcd $00000000
@dcd $066CBB9C,$067976D8,$067FCC6E,$068C87AA,$0698DCE6,$069F987C,$06ABEDB8,$06B8A8EE,$06BEFE8A,$06CBB9C6,$06D86EFC,$06DECA98,$06EB7FCE,$06F7DB6A,$06FE96A6,$076AEBDC
@dcd $0777A778,$077DFCAE,$078AB7EA,$07976D86,$079DC8BC,$07AA7DF8,$07B6D98E,$07BD8ECA,$07C9EA66,$07D69F9C,$07DCFAD8,$07E9B66E,$07F66BAA,$07FCC6E6,$08697C7C,$086FD7B8
@dcd $087C8CEE,$0888E88A,$088F9DC6,$089BF8FC,$08A8AE98,$08AF69CE,$08BBBF6A,$08C87AA6,$08CECFDC,$08DB8B78,$08E7E6AE,$08EE9BEA,$08FAF786,$0967ACBC,$096E67F8,$097ABD8E
@dcd $098778CA,$098DCE66,$099A899C,$09A6DED8,$09AD9A6E,$09B9EFAA,$09C6AAE6,$09CD667C,$09D9BBB8,$09E676EE,$09ECCC8A,$09F987C6,$09FFDCFC,$0A6C9898,$0A78EDCE,$0A7FA96A
@dcd $0A8BFEA6,$0A98B9DC,$0A9F6F78,$0AABCAAE,$0AB87FEA,$0ABEDB86,$0ACB96BC,$0AD7EBF8,$0ADEA78E,$0AEAFCCA,$0AF7B866,$0AFE6D9C,$0B6AC8D8,$0B777E6E,$0B7DD9AA,$0B8A8EE6
@dcd $0B96EA7C,$0B9D9FB8,$0BA9FAEE,$0BB6B68A,$0BBD6BC6,$0BC9C6FC,$0BD67C98,$0BDCD7CE,$0BE98D6A,$0BEFE8A6,$0BFC9DDC,$0C68F978,$0C6FAEAE,$0C7C69EA,$0C88BF86,$0C8F7ABC
@dcd $0C9BCFF8,$0CA88B8E,$0CAEE6CA,$0CBB9C66,$0CC7F79C,$0CCEACD8,$0CDB686E,$0CE7BDAA,$0CEE78E6,$0CFACE7C,$0D6789B8,$0D6DDEEE,$0D7A9A8A,$0D86EFC6,$0D8DAAFC,$0D9A6698
@dcd $0DA6BBCE,$0DAD776A,$0DB9CCA6,$0DC687DC,$0DCCDD78,$0DD998AE,$0DDFEDEA,$0DECA986,$0DF8FEBC,$0DFFB9F8,$0E6C6F8E,$0E78CACA,$0E7F8666,$0E8BDB9C,$0E9896D8,$0E9EEC6E
@dcd $0EABA7AA,$0EB7FCE6,$0EBEB87C,$0ECB6DB8,$0ED7C8EE,$0EDE7E8A,$0EEAD9C6,$0EF78EFC,$0EFDEA98,$0F6A9FCE,$0F76FB6A,$0F7DB6A6,$0F8A6BDC,$0F96C778,$0F9D7CAE,$0FA9D7EA
@dcd $0FB68D86,$0FBCE8BC,$0FC99DF8,$0FCFF98E,$0FDCAECA,$0FE96A66,$0FEFBF9C,$0FFC7AD8,$1668D66E,$166F8BAA,$167BE6E6,$16889C7C,$168EF7B8,$169BACEE,$16A8688A,$16AEBDC6
@dcd $16BB78FC,$16C7CE98,$16CE89CE,$16DADF6A,$16E79AA6,$16EDEFDC,$16FAAB78,$176766AE,$176DBBEA,$177A7786,$1786CCBC,$178D87F8,$1799DD8E,$17A698CA,$17ACEE66,$17B9A99C
@dcd $17BFFED8,$17CCBA6E,$17D96FAA,$17DFCAE6,$17EC867C,$17F8DBB8,$17FF96EE,$186BEC8A,$1878A7C6,$187EFCFC,$188BB898,$18986DCE,$189EC96A,$18AB7EA6,$18B7D9DC,$18BE8F78
@dcd $18CAEAAE,$18D79FEA,$18DDFB86,$18EAB6BC,$18F76BF8,$18FDC78E,$196A7CCA,$1976D866,$197D8D9C,$1989E8D8,$19969E6E,$199CF9AA,$19A9AEE6,$19B66A7C,$19BCBFB8,$19C97AEE
@dcd $19CFD68A,$19DC8BC6,$19E8E6FC,$19EF9C98,$19FBF7CE,$1A68AD6A,$1A6F68A6,$1A7BBDDC,$1A887978,$1A8ECEAE,$1A9B89EA,$1AA7DF86,$1AAE9ABC,$1ABAEFF8,$1AC7AB8E,$1ACE66CA
@dcd $1ADABC66,$1AE7779C,$1AEDCCD8,$1AFA886E,$1B66DDAA,$1B6D98E6,$1B79EE7C,$1B86A9B8,$1B8CFEEE,$1B99BA8A,$1BA66FC6,$1BACCAFC,$1BB98698,$1BBFDBCE,$1BCC976A,$1BD8ECA6
@dcd $1BDFA7DC,$1BEBFD78,$1BF8B8AE,$1BFF6DEA,$1C6BC986,$1C787EBC,$1C7ED9F8,$1C8B8F8E,$1C97EACA,$1C9EA666,$1CAAFB9C,$1CB7B6D8,$1CBE6C6E,$1CCAC7AA,$1CD77CE6
; bits 24 ? 32
@dcd $00000000
@dcd $1CDDD87C,$39BBAA98,$56997CAE,$6D76EECA,$89EEC6E6,$166CC98F,$17DAA6B7,$19A87DD8,$1B6FFAFA,$1CDDD87C,$1EABAF9D,$26798CBF,$27E769E6,$29AEE768,$2B7CBE8A,$2CEA9BAB
@dcd $2EB878CD,$367FEFEE,$37EDCD76,$39BBAA98,$3B8987B9,$3CF6FEDB,$3EBEDBFC,$468CB97E,$47FA96A6,$49C86DC7,$4B8FEAE9,$4CFDC86A,$4ECB9F8C,$56997CAE,$5866F9CF,$59CED6F7
@dcd $5B9CAE78,$5D6A8B9A,$5ED868BC,$669FDFDD,$686DBCFF,$69DB9A86,$6BA977A8,$6D76EECA,$6EDECBEB,$76ACA96D,$787A868E,$79E7FDB6,$7BAFDAD8,$7D7DB7F9,$7EEB8F7B,$86B96C9C
@dcd $8886E9BE,$89EEC6E6,$8BBC9E67,$8D8A7B89,$8EF7F8AA,$96BFCFCC,$988DACEE,$99FB8A6F,$9BC96797,$9D96DEB8,$9EFEBBDA

It mays remain some bugs in special case, but i'd no time to spend on this piece of code actually.
This code is very suitable for values greater than 32 bits too.

Have fun !

-- Kay

I will have to test this when I find some time. Thanks a lot Kay!

I particularly enjoy how it ends

Code:

loop:
b loop

Very efficient ;)

I too would prefer it as a function, with defined inputs and outputs, that I can drop into existing code.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

For Tepples :

Code:

@ARM

Binary_to_BCD_conversion:
; in R0 => binary value [24 bits max]
; out R1 <= BCD value [8 digits 0 - 99999999]

stmfd sp!, {r2,r3,r4,r5}

and r1,r0,$ff

ldr r3,=BCD_add_table ; BCD lookup table

ldr r2,[r3,r1 lsl 2] ; BCD (read BCD table)

mov r1,r0 lsr 8
and r1,r1,$ff
add r3,r3,1024
ldr r1,[r3,r1 lsl 2] ; BCD (read BCD table)

ldr r5,=$11111110 ; carry count mask value

add r4,r2,r1 ; r4 = r2 + r1
eor r2,r1,r2 ; r2 = r1 XOR r2
eor r2,r4,r2 ; r2 = r4 XOR r2
bic r2,r5,r2 ; r2 = 0x11111110 AND NOT r2
orr r2,r2,r2 lsr 1 ; r2 = r2 OR (r2 LSR 1)
sub r1,r4,r2 lsr 2 ; r1 = r4 - (r2 LSR 2)

mov r2,r0 lsr 16
and r2,r2,$ff
add r3,r3,1024
ldr r2,[r3,r2 lsl 2] ; BCD (read BCD table)

add r4,r2,r1 ; r4 = r2 + r1
eor r2,r1,r2 ; r2 = r1 XOR r2
eor r2,r4,r2 ; r2 = r4 XOR r2
bic r2,r5,r2 ; r2 = 0x11111110 AND NOT r2
orr r2,r2,r2 lsr 1 ; r2 = r2 OR (r2 LSR 1)
sub r1,r4,r2 lsr 2 ; r1 = r4 - (r2 LSR 2)

ldmfd sp!, {r2,r3,r4,r5}
bx lr

BCD_add_table:

<<Please place the latest table hereafter.>>

Please give me a little credit if you use it ;)

Miked0801 pretends earlier in this post, that there's a faster way.
So what about [without consumming 16MB of lookuptable] ??? ;)
I'm very curious ...

-- Kay

Thanks.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Quote:

Miked0801 pretends earlier in this post, that there's a faster way.
So what about [without consumming 16MB of lookuptable] ??? ;)
I'm very curious ...

Lol - my response had only to do with the looping over the swi 16 way back when. The only pretending I'm doing here is in understanding exactly how this works. Too much bit fun for me to just read the code and understand what it is you are doing. From just a techincal view though, you guys have got this routine tight. No looping, only 3 memory access, very few mov instructions, lots of free shifting, etc.

That said, I'd investigate reversing the initial 1234567 such that you don't need to "and" & shift each time you read from the register - should be able to just it shift it around as needed. Also, it feels like the move r2,r0 lsr 16 could somehow be optimized away - though how I'm not sure. Just an intuition thing. Other comments welcome :)

Miked0801 wrote:

Also, it feels like the move r2,r0 lsr 16 could somehow be optimized away - though how I'm not sure. Just an intuition thing. Other comments welcome :)

Something like this?

Code:

and r1,r0,$ff00
add r3,r3,1024
ldr r1,[r3,r1 lsr 6] ; BCD (read BCD table)
............
and r2,r0,$ff0000
add r3,r3,1024
ldr r2,[r3,r2 lsr 14] ; BCD (read BCD table)

Oh, and you don't have to save r2 & r3 to the stack, you can use r12 withouy a problem, r0 should also be the return register, so the last line should read:

Code:

sub r0,r4,r2,lsr#2

_________________
I probably suck, my not is a programmer.

Yep - something like that. In ARM, if you are using a mov instruction just for a shift, and other instructions around it aren't shifting, there's almost always a way to remove the mov. Just something to watch for.

How about the reversing the numbers thing. Anyone want to take a quick crack at that? I bet it removes at least 1 instruction off the front.

My code:

Code:

~~~
~~~

Edit: Added the (hopefully working) integer_to_string function.
Edit: Doesn't work;; to lazy to debug.

Last edited by strager on Tue May 17, 2005 9:50 pm; edited 1 time in total

strager wrote:

My code (in the works):

Code:

~~~
@
@ These might work.. Haven't tested :-)

Edit: Added the (hopefully working) integer_to_string function.

:-) Man, this is bad. It's a good idea to test before you post.
This will not even compile...

Yes, most of the time when I read that it was not tested, I don't even bother trying it, unless I know the coder is very good. Please test your code.

gbadev.org forum archive

ASM > Integer -> String

#7620 - Lupin - Sat Jun 21, 2003 3:27 pm

#7625 - DekuTree64 - Sat Jun 21, 2003 4:05 pm

#7626 - Lupin - Sat Jun 21, 2003 4:16 pm

#7666 - Lupin - Sun Jun 22, 2003 7:16 pm

#19357 - mr_schmoe - Sat Apr 17, 2004 9:54 pm

#19363 - DekuTree64 - Sat Apr 17, 2004 11:46 pm

#19367 - poslundc - Sun Apr 18, 2004 1:26 am

#23832 - Kay - Thu Jul 22, 2004 4:57 pm

#23855 - MumblyJoe - Fri Jul 23, 2004 1:12 am

#23856 - poslundc - Fri Jul 23, 2004 2:46 am

#23875 - col - Fri Jul 23, 2004 10:49 am

#23878 - Kay - Fri Jul 23, 2004 12:25 pm

#23890 - f(DarkAngel) - Fri Jul 23, 2004 4:09 pm

#23891 - poslundc - Fri Jul 23, 2004 4:21 pm

#23895 - Miked0801 - Fri Jul 23, 2004 6:30 pm

#26153 - isildur - Wed Sep 08, 2004 5:04 pm

#26156 - FluBBa - Wed Sep 08, 2004 5:55 pm

#26159 - isildur - Wed Sep 08, 2004 6:10 pm

#26191 - FluBBa - Thu Sep 09, 2004 9:09 am

#26200 - isildur - Thu Sep 09, 2004 2:42 pm

#26201 - FluBBa - Thu Sep 09, 2004 3:43 pm

#26203 - isildur - Thu Sep 09, 2004 4:04 pm

#26210 - Miked0801 - Thu Sep 09, 2004 5:46 pm

#26225 - ecurtz - Thu Sep 09, 2004 8:46 pm

#26254 - FluBBa - Fri Sep 10, 2004 9:33 am

#26263 - isildur - Fri Sep 10, 2004 5:50 pm

#26372 - [mRg] - Tue Sep 14, 2004 12:24 am

#26390 - f(DarkAngel) - Tue Sep 14, 2004 3:02 pm

#26393 - isildur - Tue Sep 14, 2004 3:43 pm

#26395 - f(DarkAngel) - Tue Sep 14, 2004 5:18 pm

#26400 - SmileyDude - Tue Sep 14, 2004 7:09 pm

#26413 - keldon - Wed Sep 15, 2004 1:20 am

#26470 - f(DarkAngel) - Thu Sep 16, 2004 2:38 pm

#26473 - FluBBa - Thu Sep 16, 2004 3:44 pm

#27190 - Kay - Tue Oct 05, 2004 9:22 pm

#34744 - tum_ - Tue Jan 25, 2005 11:30 am

#34745 - tum_ - Tue Jan 25, 2005 12:42 pm

#34796 - FluBBa - Wed Jan 26, 2005 11:26 am

#34810 - isildur - Wed Jan 26, 2005 5:26 pm

#35054 - Lupin - Sat Jan 29, 2005 10:45 pm

#41497 - gator - Fri Apr 29, 2005 4:50 am

#41501 - tepples - Fri Apr 29, 2005 5:10 am

#41524 - isildur - Fri Apr 29, 2005 2:21 pm

#41527 - FluBBa - Fri Apr 29, 2005 2:48 pm

#42018 - Kay - Wed May 04, 2005 12:40 pm

#42023 - isildur - Wed May 04, 2005 3:04 pm

#42027 - Miked0801 - Wed May 04, 2005 6:06 pm

#42031 - tepples - Wed May 04, 2005 6:55 pm

#42124 - Kay - Fri May 06, 2005 12:14 pm

#42132 - tepples - Fri May 06, 2005 2:38 pm

#42141 - Miked0801 - Fri May 06, 2005 6:22 pm

#42246 - FluBBa - Sun May 08, 2005 10:53 am

#42323 - Miked0801 - Mon May 09, 2005 10:27 pm

#42934 - strager - Tue May 17, 2005 12:05 am

#42946 - tum_ - Tue May 17, 2005 8:15 am

#42973 - isildur - Tue May 17, 2005 6:09 pm