gbadev.org forum archive

I'm in trouble doing this.

The code I use is as follows:

Code:

my_func:

<.. some code ..>

stmia r13,{r14}
bl another_func
ldmia r13,{r14}

mov pc,lr

Essentially, I realized that you need to push r14/lr, 'cos the bl instruction overwrites it with the current pc/r15.

The above method seems to works sometimes, but not always. Am I missing something? Is that the correct way to do that?

After a quick look I'm wondering why you are moving LR to PC at the end? Does it represent the return from your function? I don't think that will work when the calling function is written in THUMB because then bit 0 in LR will be set to 1 thus trying to get the CPU to run code from an improperlly aligned adress. To return from your function I think you should use bx lr.

The calling of a function code you've written with pushing LR seems to be fine, although I can't remember if you are using the right registers from the top of my head. You could also verify that bl is capable of exchanging instruction-set but I think it is. Isn't it the same as bx plus it store LR for return?

And btw, when you as programmer are referring to the SP and LR for instance I think you are better of actually writing SP and LR instead r13 and r14. But maybe it's just me and my bad memory. :)

If none of the above helps maybe you can explain why it doesn't work. What happens and what are you expecting to happen, etc.
_________________
You can't beat our meat

Thanks for your reply.

Just to make it clear: That should be ARM asm not THUMB code and, well, I use mov pc,lr to return from a function.

You need to use stmfd r13!, {r14} ldmfd r13!, {r14}. The stack starts at the end of IWRAM and grows down, and fd means full descending (full meaning the current stack value is used, so it will decrement it and then write r14), and the ! means store the new stack pos after the decrement, so if the function you call uses the stack it won't overwrite the value you just wrote.
Also, when calling functions you should use stmfd r13!, {r0-r3, r14}, because the ARM procedure calling standard says the first 4 regs (or it might have been reg 0-4, which would be 5 regs. Anybody want to verify that?) are free to corrupt without storing them first, so anything you have in them before calling a function could be changed when it returns.

And it would be better to use bx lr instead of mov pc, lr. Even if you're not in THUMB mode, so it works fine, it just not standard. Also, I think when returning from an IRQ handler, it does some register swapping during the bx that it wouldn't do on a mov.

And as for the r13/r14 sp/lr, in ARM mode it doesn't matter (I use reg numbers), but in THUMB it does, cause most instructions in THUMB can only access r0-r7, and a few can use some of the special regs (sp/lr/pc)specifically, like ldr r0, [sp, #1000], which is a specific instruction to load based on the stack, so it can use more bits for the immediate value instead of the dest reg (acually the value is stored as 8 bits, but shifted left twice cause it's generally aligned to 4 bytes for loading words anyway, so the bottom 2 bits are always 0 (so you can only use numbers that are a multiple of 4)). Actually I've never tried putting r13 there instead of sp, so it might work too.

I'm an experienced ARM developer and here's the offical word from the latest ARM/Thumb Procedure Call Standard (which GCC will follow when you compile C code). This all applies in either mode, ARM or Thumb.

r0-r3 (also known as a1-a4) may be overwritten by any function; it is the *caller's* responsibility to save their values on the stack if they are required to be preserved. They should be used for function parameters (in increasing order) and r0/a1 should be used for the return value.
r4-r12 (also known as v1-v8 and ip) may NOT be overwritten by any function. If you change the value of any of those registers during execution of your function, you must push them to the stack first, and pop them back afterward.
r13, also known as sp, is the stack pointer and you shouldn't change it directly unless you know exactly what you are doing.
r14, also known as lr, is the link register, and you need to save it if you call a second function from inside your function.

The correct way to save and load these registers is as follows:
ARM:
stmfd sp!,{r4-r9,lr} (whichever of the r4-r12 registers you change)
... do some stuff that corrupts r4-r9 ...
bl someotherfunction (destroys lr)
ldmfd sp!,{r4-r9,pc}

Thumb:
push sp!,{r4-r7,lr} ( you can only change up to r7 directly in thumb )
.. do some stuff that corrupts r4-r7 ...
bl someotherfunction (destroys lr)
pop sp!,{r4-r7,pc}

You do not need a return instruction - just pop the value that was stored from lr directly back into pc. This is guarenteed by the ATPCS and the ARM7 specs to restore thumb/arm state correctly and return to the right place whatever happens.

If you don't need to push any values to the stack - you do not change r4-r12 and you do not call any functions - then the correct way to return is:
bx lr (in arm or thumb)

You should never mov pc,lr in a program which might conceivably have both arm and thumb code (this includes anything that links with any libraries, mostly), because it will not restore the processor state. bx lr will work in either thumb or arm and will work whether the caller was thumb or arm code, and it is no slower than mov pc,lr.

There you go, that's ARM's official word on how it's done. If you don't follow this standard you will find that your asm code will NOT interoperate with C/C++ code, because the registers used by gcc will not match yours. I have written substantial amounts of interworking code (thumb and arm mixed) and can call C functions from my asm and my asm functions from my C without any problems just by following this.

You can get the ATPCS from ARM's website, btw, but it's a bit complex. Ignore all sections except for the base standard and interworking, as floating point, relocation..etc are not applicable to the gameboy; also, anything it mentions as being part of ARM architecture v5 is also not present on the gba as the gba's processor (ARM7TDMI) is an ARMv4.

Hope that helps everyone!

Torne

Just 2 or 3 questions,

When you are using both thumb and ARM code, you say that we must use bx lr. Before using the bx instruction, musn't you pop the older value of lr that is in the stack :

Code:

push sp!,{r4-r7,lr}
bl someotherfunction (destroys lr)
....
pop sp!,{r4-r7,lr}
bx lr

I've also seen in the Golroad Manual how to create jump tables, but I don't see why we should do this, using labels should be enough for creating functiuns, no?

Finally, I still don't really understand how to switch from ARM to THumb code. What does really happen in the hardware, and what must we add in the code so that it works (detailed and technical explications are welcome)?

PS : Why push/pop in Thumb code and stmfd/ldmfd in ARM?(surely a silly question but I don't see why actually...)

funkeejeffou wrote:

Code:

push sp!,{r4-r7,lr}
bl someotherfunction (destroys lr)
....
pop sp!,{r4-r7,lr}
bx lr

If you're quoting me, there was some pretty bad stuff in that post^_^;
Do what Torne says and use

Code:

push sp!, {r4-r7, lr}
//do stuff
pop sp!, {r4-r7, pc} //restore regs and return all at once

And push/pop are special instructions in THUMB to do stmfd r13!, rList (can include r0-r7, and lr), and ldmfd r13!, rList (can include r0-r7, and pc ), because THUMB only has ldmia/stmia (not all the variations ARM does, including ldmfd/stmfd). Including lr/pc depending on wether you're storing or loading is there just for this kind of use.

Quote:

I've also seen in the Golroad Manual how to create jump tables, but I don't see why we should do this, using labels should be enough for creating functiuns, no?

Jump tables are good for things like switch statements. You have an array of addresses, so you can just take the value of your variable, and branch to the label stored in that entry of your jump table. That way you don't have to have a specific test and conditional branch to each label.
For regular functions, yes, branch straight to the label. Unless it's in a different section (like going from ROM to IWRAM), and then a regular bl can't go that far from the pc. I don't know the best way to do a long branch, so hopefully someone else will be able to fill you (and me) in on that.

Quote:

Finally, I still don't really understand how to switch from ARM to THumb code. What does really happen in the hardware, and what must we add in the code so that it works (detailed and technical explications are welcome)?

Since all instuctions are either 2 or 4 bytes, the first bit of the address is never used in the address, so it's used as the mode flag. If bit0 of the address you're branching to is set to 1, the CPU goes into THUMB mode, if it's 0, it goes into ARM.
When I wrote that post before, I thought only bx could switch modes, but according to Torne, push lr/pop pc will work too.
For calling THUMB functins from ARM, in the function header (where you put the .global funcName, .thumb, .section type stuff), put .thumb_func to tell it to set the lower bit of the label to 1, so the CPU will switch to THUMB when you bl to it.

Two things:
Only bother to push/pop the registers you change. You don't need to save and restore values you have not altered.

Also, I made two mistakes in my previous (long time ago) post. You can't restore the CPU mode using pop/ldmfd on the GBA; this feature only exists in ARM architecture v5T, and the GBA's processor (ARM7TDMI) is only architecture v4T. Also, you don't have to save the value of r12/ip, because the procedure call standard doesn't require it. (it's the intra-procedure call scratch register)

To do an interworking return from ARM, you will in fact need to do what funkeejeffou asked.. pop the value of lr then bx lr. However, this is not possible in THUMB, as the 'pop' instruction may not take lr as an argument.

Having said that, here's a more accurate answer which actually works. *blush*

ARM function which calls other functions:
stmfd sp!,{r4-r9,lr} (whichever of the r4-r11 registers you change)
... do some stuff that corrupts r4-r9 ...
bl someotherfunction (destroys lr)
ldmfd sp!,{r4-r9,lr}
bx lr

ARM function which does not call other functions:
stmfd sp!,{r4-r9} (whichever of the r4-r11 registers you change)
... do some stuff that corrupts r4-r9 ...
ldmfd sp!,{r4-r9}
bx lr

Thumb function which calls other functions:
push {r4-r7,lr} (whichever of the r4-r7 registers you change)
... do some stuff that corrupts r4-r7 ...
bl someotherfunction (destroys lr)
pop {r4-r7}
pop {r3} (some r0-r3 register which does not contain your return value)
bx r3

Thumb function which does not call other functions:
push {r4-r7} (whichever of the r4-r7 registers you change)
... do some stuff that corrupts r4-r7 ...
pop {r4-r7}
bx lr

If you don't change the values of any registers except r0-r3 and r12, and don't call any functions, then you don't need the stmfd/ldmfd/push/pop at all, you can just bx lr at the end. These are called 'leaf' functions and are nice and fast =)

Yes, I know the thumb interworking return (two pops and a bx) is ugly, but it's the one generated by GCC and the one referred to in the ARM v4T notes. There is no way to do it 'faster'.

I still recommend that anyone programming assembler for the GBA gets a copy of the ARM/Thumb Procedure Call Standard from www.arm.com and reads it, especially the section on interworking.

As for your other questions: DekuTree64 is right about the bottom-bit stuff with bx, however he misunderstands the purpose of .thumb_func (which I should also note is a GNU as directive and I have no idea if there is a GoldRoad equivalent).

If you call functions by bl FunctionName, all you need do to make them work, whether the source/destination are ARM/Thumb, is use interworking returns as shown above. The assembler fixes it all for you by inserting shims. For example, if the source is in Thumb and destination is in ARM, the assembler will automatically replace your 'bl FunctionName' with a 'bl __FunctionName_from_thumb'. That routine is automatically created and added to your code (it consists of 'bx pc; nop; b FunctionName' if you're wondering, which is a slightly complicated code sequence to understand *grin*). The converse is done for ARM->Thumb calls.

The return has to be handled by your code by one of the interworking return code sequences shown above.

However, if you call functions through pointers (i.e. if you have the value of the FunctionName symbol stored in a register, and you use 'bl r3' or something to call them), which is what you do for jump tables..etc, then things get tricky. You need three things; interworking returns on all your functions we've discussed. The second thing you need is for the value of the FunctionName symbol to have the lowest bit set to 1. This is done automatically in GNU as if you declare the function as .thumb_func. The last thing you need to do is to call the function using bx instead of bl, in order to make the (possible) switch between the two instruction sets.

This is inconvenient, though, for function calls that need to return; bx doesn't save the return address in lr, and the 'blx' instruction doesn't exist on the GBA (only on more recent ARM cores). What you need to do, in either ARM or Thumb, to call through a pointer is this:

mov lr, pc (sets the return address to two instructions ahead, i.e. 'next instruction', because of the ARM core pipelining)
bx r3 (or whichever register contains your function pointer)
.. next instruction ..

Annoying. Ah well.

That should cover everything; how to do interworking direct and indirect calls, and interworking returns. Yes, interworking is slightly painful at times.

The only way to avoid the complications of interworking is not to write interworking code. Omit the -mthumb-interwork argument to the assembler in order that your code is not marked as interworking-capable. You will then have to deal with any ARM/Thumb instruction set switching within your code for yourself, which *can* be simpler but requires a lot of care. Also, you won't be able to link against any interworking code. =)

This has got a bit vague thanks to my mistakes, I think; my exams finished yesterday, so at some point soon I may well write a guide to GBA assembler programming with respect to the ATPCS. Following calling standards is important as without them, you can't link your code to anyone else's and expect it to work, and you can't link to any C code either.

Torne

Thanks a lot, really helpfull !
And, yes Torne, please write a GBA assembly tutorial cause everything on the net is "slightly" incomplete...

I don't want to annoy you again :) but, could someone just explain me what really are textarea's and pools (@pool, @ltorg...). I need to place my code and variables into specific ram or rom location and things do not seem so simple. Of course, code is always in ROM at the beginning, just want to know how to copy them properly to RAM (without overwriting previous data).
Also, I've heard the stack is in Iwram(beginning or end). Does it have a max size or can we push as much as we want (in the limit of 32K)?
If so, it does mean that we must keep trace of all code and variables in Iwram so that the stack and other data don't overwrite each other (we must calculate the instructiuns lenght + operands lenght for each line of Iwram code).
Can we create a stack in Ewram (for Thumb code, it should more efficient, no?)?

As you can see, I'm newbie to ASM, but I've coded a lot and lot in C(mostly for PC). Things are different in conception when you code in ASM, and I'm finally beginning to understand what does really happen in hardware whan code executes. It's really a pain in the *** to get good docs on GBA ASM coding, plus I'm using golroad with it's special syntax.
All that to thank you for answering my questions because I don't know how I would do without your help.
Enough Talk, have a good day ;)

funkeejeffou wrote:

could someone just explain me what really are textarea's and pools (@pool, @ltorg...). I need to place my code and variables into specific ram or rom location and things do not seem so simple.

ltorg and pool are the same thing. They force goldroad to proccess all the indefenite values from ldr. If you use a lot of raw data (dub or incbin) you want to use those directives. But goldroad will tell you, when it needs them whilst assembling.

textarea just tells goldroad for which type of RAM/ROM the binary should be assembled. Putting textarea $02000000 at the beginning of your file WON'T neccessesarily put it into exWRAM (unless you are using multiboot). You still have to copy the whole binary into that RAM. Using two addr labels and doing multiload and multistore should do the trick. After that don't forget to adjust the PC. Maybe DMA copy would also be good. Don't know which one is faster for this kind of copy.

Peace. :)
_________________
http://www.nausicaa.net

Then if we copy the code to Iwram manually, what's the point of putting a textarea 0x2000000?
Also, you're saying to update the PC, how do you do this (I guess that the PC will in fact point to the code in ROM, so we must change it as you say)?

Or maybe the point of putting a textarea 0x2000000 is to set the PC to the right address after all. Can anyone confirm?

And when should we split the pool?
Where do we really place the main textarea (after the all the bin files), are the functiuns included in it?
Is the "b start" in the textarea?
How many pools does a program usually have (I would say one...)?

How about reading the goldroad manual first. Post any questions you then still have.
_________________
http://www.nausicaa.net

Already read it...
The explications are not very detailed, and some of the code are buggy.
Can't really call that a manual.

Sorry, more questions on this topic...

This is something i've been thinking about for some time, and since we are talking about moving code to RAM... and I can't find out what's the right way to do that (to me it's not as trivial as simply moving code).

Correct me if I'm wrong but won't all the labels be incorrect? What happens with all the jumps to labels that are pointing to ROM addresses? They'll still point to ROM when we move all the code to RAM won't they??

funkeejeffou wrote:

Then if we copy the code to Iwram manually,
And when should we split the pool?
Where do we really place the main textarea (after the all the bin files), are the functiuns included in it?
Is the "b start" in the textarea?
How many pools does a program usually have (I would say one...)?

I can answer this. I started a while ago coding in ASM (just ARM, have to start with THUMB soon), using Goldroad and following GBAguy's tutorial (thanks, mate!).

The fact is, I started coding a game completely in ASM. I reached a point when the pool was too far to be used by my lds. Then Goldroad just issued an error.

So far I've used 3 or 4 different pools. I had to place them in between code and had to make sure those pools were not executed (branching).

Hope it helps.

Thanks TurboHz, yes it helps.

About moving code into ram, I've been trying some code of mine in the last hours and I can tell you what I've understood(maybe it's wrong) :

When goldroad compiles your code, EVERY single piece of code goes to ROM (starting at the adress 0x8000000).
So yes, you must manually move your code to the correct location (Iwram or Ewram).
If only this would have been done, yes the labels would be wrong and the code that you would have placed in RAM would still point at the adresses in ROM.
That's why for all the code that you want to place in RAM, you should put a "textarea 0x3000000"(eg for Iwram). At the beginning I thought that this would be enough for the compiler to copy the code to RAM by itself(I said at the beginning...), but infact, the textarea feature is just here to give a relative base adress to your line of codes, so that all your labels would correctly point to the code in RAM ONCE they are copied to RAM. Also, be carefull when copying your code, you should copy it in the same order than they are declared in the textarea, otherwise labels and code will be in RAM but won't correspond.
Also, spill the pool at the end with @ltorg, so that you can declare code somewhere else(ROM or IWRAM) by using new textarea's.

Code:

start_of_iwramcode
@textarea 0x3000000
here code and variables located in IWRAM...
@ltorg
@endarea
end_of_iwramcode

start_of_ewramcode
@textarea 0x2000000
here code and variables located in EWRAM...
@ltorg
@endarea
end_of_ewramcode

@textarea 0x8000000
here code and variables located in ROM
@ltorg
@endarea

then copy the data by doing something like this :

Code:

mov r0, start_of_iwramcode
mov r1, end_of_iwramcode
mov r2, #0x3000000
loop
ldr r3, [r0]
str r3, [r2]
add r0, r0, #4
add r2, r2, #4
cmp r0, r1
ble loop

Somebody correct me please if I'm wrong, otherwise, hope this helps.

Welll...I don't quite see your problem with the goldroad manual. Everything you just posted you'd be able to find there.
_________________
http://www.nausicaa.net

gbadev.org forum archive

ASM > Calling a procedure from withing another procedure

#5511 - TurboHz - Wed Apr 30, 2003 11:58 am

#5513 - Touchstone - Wed Apr 30, 2003 1:08 pm

#5518 - TurboHz - Wed Apr 30, 2003 1:51 pm

#5522 - DekuTree64 - Wed Apr 30, 2003 4:46 pm

#5533 - torne - Wed Apr 30, 2003 10:54 pm

#6987 - funkeejeffou - Thu Jun 05, 2003 7:50 pm

#6992 - DekuTree64 - Thu Jun 05, 2003 8:44 pm

#7023 - torne - Fri Jun 06, 2003 11:44 am

#7026 - funkeejeffou - Fri Jun 06, 2003 12:23 pm

#7028 - arundel - Fri Jun 06, 2003 12:52 pm

#7030 - funkeejeffou - Fri Jun 06, 2003 1:11 pm

#7037 - arundel - Fri Jun 06, 2003 3:50 pm

#7038 - funkeejeffou - Fri Jun 06, 2003 4:31 pm

#7041 - TurboHz - Fri Jun 06, 2003 5:17 pm

#7042 - TurboHz - Fri Jun 06, 2003 5:27 pm

#7044 - funkeejeffou - Fri Jun 06, 2003 7:03 pm

#7047 - arundel - Fri Jun 06, 2003 9:48 pm