gbadev.org forum archive

1.
Is this dangerous?

Code:

LDR r0, =address to something
LDR r0, [r0]

it works on both VBA and No$GBA havent tried on hardware but is it supposed to work or isnt it a good choice?

2.
Does it exist better ways to copy things in thumb mode then this:

Code:

LDR r3, =200
Copy:
LDR r2, [r0]
STR r2, [r1]
ADD r0, #4
ADD r0, #4
SUB r3, #1
CMP r3, #0
BNE Copy

I would be glad if these to questions could be answered.
thanks in advance!

I'm pretty sure 1 is fine. It should work, and should not cause any performance problems.

I'm not sure about 2 as I haven't used thumb myself, but in ARM you can use the multiple register transfer instructions. Are these available in thumb? *checks book* Yes, they are. Using LDMIA/STMIA will be faster. Also, your last instruction before the branch, "CMP r3, #0", can be removed as the SUB will set the Z condition code for you.

ProblemBaby wrote:

1.
Is this dangerous?

Code:

LDR r0, =address to something
LDR r0, [r0]

it works on both VBA and No$GBA havent tried on hardware but is it supposed to work or isnt it a good choice?

This should work at anytime.
There is a problem that might occur when doing this alot:
The literal pool range will go beyond the PC-relative offset.

This means you will have to tell the assembler to write the literal pool, and branch over that.
The GCC command for writing the current literal pool is ".pool", or the ARM SDT equivalent is "LTORG".
Example:

Code:

b Skip_Pool
.pool
Skip_Pool:
... continue with code...

ProblemBaby wrote:

2.
Does it exist better ways to copy things in thumb mode then this:

Code:

LDR r3, =200
Copy:
LDR r2, [r0]
STR r2, [r1]
ADD r0, #4
ADD r0, #4
SUB r3, #1
CMP r3, #0
BNE Copy

I believe most thumb alu opcodes are always setting the S flags.
So I'd say you can remove that cmp instruction.
Also, why are you adding 4 to r0 twice?
I think you meant r1 in the second one.

Code:

LDR r3, =200
Copy:
LDR r2, [r0]
STR r2, [r1]
ADD r0, #4
ADD r1, #4
SUB r3, #1
BNE Copy

The ARM equivalent of copying memory blocks is much nicer:

Code:

LDR r3,=200
Copy:
LDR r2,[r0],#4
STR r2,[r1],#4
SUBS r3,r3,#4
BNE Copy

Cheers.

Actually, you can use the THUMB multiple load/stores to do the exact same thing as that ARM copy:

Code:

LDR r3, =200
Copy:
LDMIA r0!, {r2}
STMIA r1!, {r2}
SUB r3, #1
BNE Copy

And yes, that load address/load value is the normal way to do it. That's one of the main reasons it's good to put related globals in a struct, then you load the address of the struct, and all the individual vars are offset from that, or better yet, loaded with a single ldmia.
The other solution, if you're writing code that will be in RAM, is to define your globals right near your function, so you can load them directly by their labels (which assembles to a PC-relative load), like this:

Code:

.section .iwram, "ax", %progbits
.global var1
var1:
.word 0
.global var2
var2:
.word 0

.global function
.arm
.align 2
function:
ldr r0, var1

That's kind of a hacky way to do it, but it does work if you need to do a lot of individual loading/storing and don't have a free register to store the struct address.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Thanks for the posts!
it really helped

Ive one more question ive a ASM-function that is called from C the prototype looks something like this:

Code:

void Print(u8 x, u8 y, char *text)

it doesnt work good at all if I send the same pointer text pointer to the function the second time it doesnt work. I tried to dont increment the pointer at all but it was still changed. Is it something that Ive to know about this kind of function??[/code]

You can increment the pointer all you want, because it's only copied into a register when calling the function (r2, in this case), just as long as you don't modify the data that it's pointing to (which you most likely aren't doing). I don't know why it would work once and not twice though. Would you mind posting some code from the Print function?
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Sure!
Note: that this isnt optimized yet

r2 is the string
r0 is where the text started
r1 is where the next char should be putted
the font is 8x16 so ive to copy twice
as you can see in the code maybe its a bit tricky to understand how the map is organized but its quite simple the top tile of for example char 'A'
is stored at mapFont['A']
and the second part is at mapFont['A'+128]
this is to make it easy to copy

Code:

CopyCharacter:
   LDRB r3, [r2]

   CMP r3, #0
   BEQ Finished
   CMP r3, #'\n'
   BNE NoNewRow

   ADD r0, #128
   MOV r1, r0
   ADD r2, #1
   B CopyCharacter
NoNewRow:

   LSL r3, #1
   LDR r4, =g_mapFont
   ADD r4, r3

   LDRH r3, [r4]
   STRH r3, [r1]
   ADD r4, #128
   ADD r4, #128
   ADD r1, #64
   LDRH r3, [r4]
   STRH r3, [r1]

   SUB r1, #62
   ADD r2, #1
   B CopyCharacter
Finished:
   BX lr

You don't seem to be restoring the value of r4 before returning. This could possibly cause the problem you mentioned. Quick fix: use r12 instead of r4. (You are allowed to corrupt r0-r3 and r12. Everything else must be restored before returning from the function.)

jd wrote:

(You are allowed to corrupt r0-r3 and r12. Everything else must be restored before returning from the function.)

Thanks I had no idea of that.
Sadly it didnt solve the problem!
But ive used r4,r5,r6,r7 in other functions without store them
maybe the compiler (and if i Call it from C) fix this automataclly???

the code is in thumb mode is it still possible to use r12???

ProblemBaby wrote:

the code is in thumb mode is it still possible to use r12???

I must confess, I'm not sure. I forgot the code was thumb - I've only really used ARM myself but I would assume the rules are the same.

jd:
I solved the problem first time I tried I made something wrong
but now it works many thanks!

I doesnt seems to work to use r12
i tried this simple thing
LDR r12, =20
and the compiler gaved me error.
Maybe I can use it in other ways?

and then I wonder if it take long time to push variables on the stack?
for example is it worth to use one more register instead of the two ADDs?
it will be called quite many times if the string is long!

ProblemBaby wrote:

I doesnt seems to work to use r12
i tried this simple thing
LDR r12, =20
and the compiler gaved me error.
Maybe I can use it in other ways?

I seem to remember that thumb imposes limitations on which instructions can use which registers, so that might be the problem you're running into here. Personally, I'd recommend avoiding thumb assembler and using ARM assembler in IWRAM instead if you need performance as it's easier and faster. If you don't need performance then C compiled as thumb in ROM would be a better choice IMHO. Still, it's up to you.

ProblemBaby wrote:

and then I wonder if it take long time to push variables on the stack?

Accessing the stack is pretty quick.

ProblemBaby wrote:

for example is it worth to use one more register instead of the two ADDs?

That depends on how many times you expect the extra ADD to be used, and also what type of memory your code and the stack is in.

(Although if you were using ARM you'd be able to get rid of both of the ADDs and a lot more besides. :)

Last edited by jd on Tue Aug 10, 2004 3:24 am; edited 1 time in total

Glad to hear you got it working. I couldn't figure out what was wrong with it, so I'd planned on giving it a closer look later.

As for r12, there is nothing 'actually' special about it, only that the ARM procedure call standard says that you can corrupt it, which means that compilers will make sure it's preserved when calling functions. Accessing it in THUMB is the same as the other upper regs, using special variations of some instructions (mov and add I know can do it, maybe sub). Generally you can just ignore it in THUMB mode, and store/load r4-r7 on the stack using push and pop. Push does exactly the same thing as a stmfd in ARM, and pop is an ldmfd. Push takes 1+NumberOfRegs cycles to execute, and pop takes 2+numberOfRegs. Since the stack is in IWRAM, you don't have any waitstates, so those are the exact times.

Most of the time, you can just push as many regs as you want right at the start of a function and pop them right at the end, so you never have to worry about them inbetween. One push/pop per function call won't matter at all in the long run, unless it's in an inner loop that gets called a lot, and then you can just make the outer loop in ASM too, so you know what's in each reg when it gets called, and don't have to do any preserving/loading new things in at all
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Ive one more question=)
if I want to make a function that takes more then 4 parameters
what happans then??
are they stored in r4, r5... or in another place?
if they arent stored in another place: what happens with the law that says
that I only can use r0-r3 and have to push the other onto the stack before I can use them??

Thanks!

ProblemBaby wrote:

Ive one more question=)
if I want to make a function that takes more then 4 parameters
what happans then??

The extra parameters go on the stack.

So I get them by doing this:
POP { r4, r5... }?

You have to save r4-r11 and r14 before you can load arguments into them.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

You don't want to change the stack pointer when retrieving them. The parameters ought to be left on the stack at the end of your routine. If necessary, make a copy of your stack pointer and use that.

Dan.

Yes, it's kind of irritating, especially in THUMB mode. You have to push some regs, and then load from sp+4*numberOfRegsPushed. In ARM, you can just copy the stack pointer to r12 (because it's free), then push the regs onto the stack, and load from r12. In THUMB, you can do about the same thing, but with 2 wasted instructions (mov to r12, push, mov r12 to something else, or push, mov sp to something, and add an offset to that). But then if you have enough args to be worth ldmia'ing, you most likely won't have any regs left to work in, so then it's easiest to just load them one at a time as needed, offsetting from sp. Another nice trick is to use
.equ var, offset
to define your stack vars, and that way you can account for the regs you push at the start and then never worry about it again.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Hmm, I want it to be easy to use in C-code
I didnt exactly got what you meant.
Maybe one of you can add some code to this down below:

Code:

@ this function takes 7 arguments
.thumb_func
.ManyArgFunc
ManyArgFunc:

@ ???

bx lr

In thumb mode, it would be something like this :

Code:

@ this function takes 7 arguments
.THUMB_FUNC

ManyArgFunc:

push {r4-r7, lr}
add r7, sp, #20
ldmia r7!, {r4-r6}

...
r0 to r6 contain the 7 parameters
...

mov r0, (a return_value)

pop {r4-r7, lr}
bx lr

just an example, not tested but should be correct,
take a look at my LZSS source, if you want an example for ARM mode.

See http://www.arm.com/products/DevTools/abi/aapcs.pdf too.

Happy coding,
DKL

Last edited by DKL on Tue Aug 17, 2004 10:19 am; edited 1 time in total

In case you're familiar with x86 assembly, it's similar to cdecl calling convention for the parameters after 4th. The callee pushes and pops the parameters.

You don't actually do push/pop anything though. Using ldr will usually be faster (depends on what you actually doing though, you might alter the stack etc).
_________________
death scream...

gbadev.org forum archive

ASM > Questions

#24665 - ProblemBaby - Mon Aug 09, 2004 3:27 am

#24669 - bakery2k - Mon Aug 09, 2004 5:27 am

#24673 - NEiM0D - Mon Aug 09, 2004 10:44 am

#24689 - DekuTree64 - Mon Aug 09, 2004 10:00 pm

#24691 - ProblemBaby - Mon Aug 09, 2004 11:54 pm

#24692 - DekuTree64 - Tue Aug 10, 2004 12:08 am

#24693 - ProblemBaby - Tue Aug 10, 2004 12:28 am

#24698 - jd - Tue Aug 10, 2004 12:59 am

#24700 - ProblemBaby - Tue Aug 10, 2004 1:09 am

#24701 - ProblemBaby - Tue Aug 10, 2004 1:32 am

#24705 - jd - Tue Aug 10, 2004 2:47 am

#24706 - ProblemBaby - Tue Aug 10, 2004 2:51 am

#24709 - jd - Tue Aug 10, 2004 3:09 am

#24710 - DekuTree64 - Tue Aug 10, 2004 3:13 am

#24887 - ProblemBaby - Fri Aug 13, 2004 1:50 am

#24888 - jd - Fri Aug 13, 2004 2:02 am

#24889 - ProblemBaby - Fri Aug 13, 2004 2:39 am

#24890 - tepples - Fri Aug 13, 2004 3:01 am

#24891 - poslundc - Fri Aug 13, 2004 3:09 am

#24893 - DekuTree64 - Fri Aug 13, 2004 4:09 am

#24903 - ProblemBaby - Fri Aug 13, 2004 12:09 pm

#24904 - DKL - Fri Aug 13, 2004 1:10 pm

#24933 - f(DarkAngel) - Fri Aug 13, 2004 6:19 pm