gbadev.org forum archive

So far I have various DS projects written in C, which work nicely. The ARM7 and ARM9 code is compiled into two separate .o files using arm-elf-g++. The same program then turns them into .elf files, then arm-elf-objcopy turns them into arm7.bin and arm9.bin which are fed into ndstool to make a .nds file.

The question is where would a .s file fit in here? I compile it to another .o file, and then what? I haven't experimented with GCC much; just basic stuff.

Also, when I compile something using the .fPIC option, how exactly does it work? Does the generated code always use relative memory accesses, and if so, where does it get the base address? (What I'm looking to do here is compile one or two functions into a file, then have my program load it into memory and have it work just as if these functions had been in its own source, save of course for the fact that they'd have no name.)

Yeah, just assemble a .s file into a .o and it links the same as if it came from a .c file or whatever. Not sure how much you know about compilers, but one useful thing to wrap your head around is that global functions and varables are referenced by literal symbol names in the .o files, and aren't actually converted to addresses until linking. That's basically what static linking is. I know next to nothing about dynamic linking, but there was a thread about it a while back if you're curious about it.

I've never messed around with position-independent code, but I do find the idea interesting. I would assume that it's all done offset-based, and any references to code outside that position-independent block would be absolute addresses. Referencing anything in another position-independent block would be tricky though, but could be done if you make an offset table to each function from the start of the block, and add the base address to that when branching.

EDIT: On second thought, referencing something in another position-independent block wouldn't be any harder than accessing the first one from the static code... If you figure out if/how the compiler does it, I'd be interested to know :)
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

That's what I thought, but where do I add in the .o file in the process of making .elf and .bin files out of them? Like to compile the ARM9 binary, I do this:

Code:

echo Compiling ARM9...
%devkitarm_path%\arm-elf-g++ -g -mcpu=arm9tdmi -mtune=arm9tdmi -mthumb-interwork -I%ndslib_path%\include -DARM9 -fomit-frame-pointer -ffast-math -o%devkitarm_path%\arm9.o -c arm9.c
IF NOT EXIST %devkitarm_path%\arm9.o GOTO done
%devkitarm_path%\arm-elf-g++ -g -mthumb-interwork -mno-fpu -specs=ds_arm9.specs %devkitarm_path%\arm9.o -L%ndslib_path%\lib -lnds9 -o%devkitarm_path%\arm9.elf
IF NOT EXIST %devkitarm_path%\arm9.elf GOTO done
%devkitarm_path%\arm-elf-objcopy -O binary %devkitarm_path%\arm9.elf %devkitarm_path%\arm9.bin
IF NOT EXIST %devkitarm_path%\arm9.bin GOTO done

Where do I reference the new .o file I've created here? My guess would be adding it after "%devkitarm_path%\arm9.o" in the second arm-elf-g++ call, is that correct? And is there an easy way to make the same code available to both CPUs without duplicating it, or would I have to include it in one and just pass the other its address at runtime?

HyperHacker wrote:

Where do I reference the new .o file I've created here? My guess would be adding it after "%devkitarm_path%\arm9.o" in the second arm-elf-g++ call, is that correct?

Yep.

Quote:

And is there an easy way to make the same code available to both CPUs without duplicating it, or would I have to include it in one and just pass the other its address at runtime?

You can set up your makefile to compile the same source file for both CPUs, but there's no easy way I know of for both binaries to call the exact same function in memory.

One way you could do it is by making a function pointer table on the CPU with the functions themselves, and just send the address of the table across to the other CPU. Kind of like this:

Code:

void DoNothing() {}
int AddSomeNumbers(int a, int b) { return a + b; }

void *functionTable[] =
{
(void*)&DoNothing,
(void*)&AddSomeNumbers,
};

Then send it across, and on the other CPU:

Code:

void (*DoNothing)(void);
int (*AddSomeNumbers)(int, int);

void ReceiveFunctionTable(void *functionTablePtr)
{
DoNothing = (void (*)(void)) (functionTablePtr[0]);
AddSomeNumbers = (int (*)(int, int)) (functionTablePtr[1])
}

Then you can call DoNothing and AddSomeNumbers from either CPU :)
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

That's what I was thinking. Thanks, just two more questions. How do I specify sections in ASM, and what are they all named? Eg when I coded for Game Boy, I could throw in "ORG $1234" and it would generate code that assumes it'll be run from address 0x1234 (mainly for calculating whether short jumps could reach a given label). How would I do it here? I know how to do it in C, which leads me to the next question: What are all the sections' names? You can specify like IWRAM, BSS etc; what names correspond to what addresses? (Or can I just specify an address?)

If you're using devkit pro, here are the main sections:
.text: Main code in ROM (starts at 0x8000000)
.rodata: Data in ROM (starts after .text)
.iwram: Code/data in IWRAM (starts at 0x3000000)
.bss: Uninitialized data in IWRAM (starts after .iwram)
.data: Initialized data in IWRAM (starts after .bss)
.ewram: Code/data in EWRAM (starts at 0x2000000)
.sbss: Uninitialized data in EWRAM (starts after .ewram)

You can specify sections in gcc with attributes, like:

Code:

int thingInEwram __attribute__((section(".ewram")));

__attribute__((section(".iwram"), long_call)) void FunctionInIwram();

The long_call is a bit slower than a normal relative branch, but IWRAM is too far from ROM to jump relatively. Not sure if the compiler will figure out that it can relative-jump if you call one IWRAM function from another.

There are also overlay sections called .iwram0-.iwram7. They all start at the SAME address (after .data I think), so you can only have one of them loaded into IWRAM at a time. Basically lets you conserve IWRAM if you have some pieces of code that will never be used at the same time.

There are also .ewram0-7, which behave similarly.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Alright, thanks a lot! :-)

I will warn you that you won't be able to use the .sbss section if you're using appended GBFS with multiboot, as I know of no way to make the startup code relocate the GBFS file to after the end of .sbss before the startup code overwrites .sbss.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

tepples wrote:

...relocate the GBFS file to after the end of .sbss before the startup code overwrites .sbss.

Brilliance :)
I've been annoyed by my DS binary being larger than necessary because I had to pad out .bss for that. It's times like this that make writing custom startup code worthwhile
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

DekuTree64 wrote:

HyperHacker wrote:

Where do I reference the new .o file I've created here? My guess would be adding it after "%devkitarm_path%\arm9.o" in the second arm-elf-g++ call, is that correct?

Yep.

OK, so why does it think the functions declared in this file don't exist?

in memory.s:

Code:

.arm
.align
.section .iwram
.global FastCopy
.type FastCopy,function

FastCopy:
[some asm code here]

in main.h:

Code:

#if ARM9
[...]
extern void FastCopy();
#endif

It creates memory.o, which contains the string 'FastCopy', but it tells me no such function exists (undefined reference).

Are you using C++? It normally expects function names to be mangled according to their arguments, but assembly labels are output as-is. You'll get similar errors mixing .c and .cpp files too. Use

Code:

extern "C" void function();

or to be .c and .cpp compatible, and not have to type as many "C"s if you have a lot of functions,

Code:

#ifdef __cplusplus
extern "C" {
#endif

extern void function();
extern void anotherFunction();

#ifdef __cplusplus
}
#endif

to tell it that the labels are unmangled.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Yay! Now only one more error. :-p

Quote:

F:\dos\DevKitPro\devkitARM\bin\arm9.o: In function `Interrupt()':
F:\Programs\Source Codes\DS\bootgba/arm9.c:891: relocation truncated to fit: R_ARM_PC24 against symbol `FastCopy' defined in .iwram section in F:\dos\DevKitPro\devkitARM\bin\memory.o

Apparently it doesn't like me putting that ".section .iwram" there. Is this even necessary on DS? (Isn't code put in IWRAM by default?)

Ah, this is for DS. The sections are similar on both, except DS has .itcm and .dtcm (instruction and data tightly coupled memory) instead of .iwram.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

I get the same with .itcm. With .dtcm (which doesn't really seem like a good place for code) it compiles but the function seems to go into an endless loop. I can't tell precisely what's going on (my graphics code is FUBARed at the moment so there's nothing displayed) but the power LED is supposed to blink for a few seconds at startup, and it just never stops blinking.

Yeah, DTCM is not executable. I guess you just found out what happens if you try :)

Try using .align 2 instead of just .align. I think that's the same error I got when putting things in .rodata way back when and that turned out to be what it was mad about (although I'm not sure exactly why).
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Still no go. Tried 4 too. Is there maybe something I'm supposed to be adding to the prototype when I put it in ITCM? Seems like it's trying to use too short a jump.

Could be. Depends on what address ITCM is mapped to with libnds. I think you put __attribute__((long_call)) at the end of the prototype, just before the semicolon to do long calls. Maybe one less set of parentheses, and you may have to specify a section along with it, but I don't remember the exact syntax. Just search the forum for it.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Well I got this far:

Code:

extern void FastCopy() __attribute__((section (".itcm"))) __attribute__((long_call))

With or without the section attribute, it still goes into a loop, but it at least compiles now. Do I have to map ITCM somewhere myself?

gbadev.org forum archive

ASM > Using both ASM and C

#73276 - HyperHacker - Fri Feb 24, 2006 9:38 am

#73279 - DekuTree64 - Fri Feb 24, 2006 10:26 am

#73341 - HyperHacker - Fri Feb 24, 2006 10:53 pm

#73344 - DekuTree64 - Fri Feb 24, 2006 11:16 pm

#73358 - HyperHacker - Sat Feb 25, 2006 1:20 am

#73469 - DekuTree64 - Sat Feb 25, 2006 10:23 pm

#73481 - HyperHacker - Sun Feb 26, 2006 12:35 am

#73500 - tepples - Sun Feb 26, 2006 3:05 am

#73503 - DekuTree64 - Sun Feb 26, 2006 3:36 am

#74743 - HyperHacker - Tue Mar 07, 2006 7:13 am

#74747 - DekuTree64 - Tue Mar 07, 2006 8:08 am

#74753 - HyperHacker - Tue Mar 07, 2006 11:29 am

#74791 - DekuTree64 - Tue Mar 07, 2006 7:27 pm

#74817 - HyperHacker - Tue Mar 07, 2006 11:46 pm

#74819 - DekuTree64 - Tue Mar 07, 2006 11:59 pm

#74820 - HyperHacker - Wed Mar 08, 2006 12:06 am

#74828 - DekuTree64 - Wed Mar 08, 2006 1:10 am

#74835 - HyperHacker - Wed Mar 08, 2006 2:33 am