gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Loading code on demand possible? Something like DLL's ...

#147534 - Peter - Sat Dec 22, 2007 11:32 am

This is something that spins around in my head for quite a while.

In my experience, performance is not really an issue on DS, but the limited amount of memory is. 4MB is not that bad, but since you have to hold everything in RAM there isn't full 4MB available. This is when I came up with the thought that having the possibility to load code on demand, would help to reduce memory usage. Except if that means you have to hold the same code several times in memory, heh.

For example, our .text section was about 600-700KB big. The menu-system and mini-games code were about 200KB afaik, but we actually never needed those ingame. If we had the possility to disconnect the menu and minigames code from the rest, we would had circa 5% more memory available ingame (twelf 128x128 textures in 8bit!!!).

I already did some research on how DLL's / code sharing works back then, but it's darn complicated imo (reusing global variables from modules etc, hell!). Has anybody of you already done something like this or maybe it's already supported by devkitarm and I just don't know that?
_________________
Kind Regards,
Peter

#147535 - keldon - Sat Dec 22, 2007 11:55 am

Huh, why would your code and data be in RAM? If the data is declared as const then it will be stored in ROM, and your functions too unless you explicitly tell it to load into RAM ... or have I misread your post?

#147536 - Peter - Sat Dec 22, 2007 12:05 pm

keldon wrote:
Huh, why would your code and data be in RAM? If the data is declared as const then it will be stored in ROM

The Nintendo DS does not have ROM like the GBA, it's all (code, data, etc) located in main memory.
_________________
Kind Regards,
Peter

#147538 - simonjhall - Sat Dec 22, 2007 12:14 pm

Take a look into the dldi driver system, or talk to chishm and co about it. This'll be probably just what you need, except you're patching your executable at runtime. The link scripts would be pretty similar, and compiling the loadable parts of code is probably the hardest part! But here most of the work is done for you.
_________________
Big thanks to everyone who donated for Quake2

#147539 - chishm - Sat Dec 22, 2007 12:36 pm

DLDI functions are called in one direction only. The functions inside a DLDI cannot call functions from the host app. You'll probably want to be able to go both ways.

I suggest looking into using code overlays. Read up on using overlays in linker scripts, you can look in gba_cart.ld included with a DevkitARM install for an example. Then use arm-eabi-objcopy to exclude the overlay sections when creating the ARM9 binary so that you don't actually have the code sitting in memory until you need it. You also need to use arm-eabi-objcopy to create a separate bin file for each code overlay. Write a function to load the relevant overlay from disc when needed, and you're almost done. Data won't be included in the overlay, so I suggest you store any large constant data on disc and load it as needed.
_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com

#147542 - wintermute - Sat Dec 22, 2007 1:17 pm

The DLDI system is a bit awkward for this kind of situation - really you need to end up writing an OS layer to handle loading and relocation not to mention calling functions from the host layer which I can't help feeling would probably be better handled by implementing an ELF loader. devkitARM is designed primarily for a static linked bare metal system and I'm not really convinced there's sufficient gain from attempting that particular route.

Something we've been experimenting with lately is the use of overlays which could be loaded from the file system. On the surface this seems much less complicated than other methods so it probably warrants some more investigation.

The basis of this method uses some linkscript magic to place overlay code in separate sections which are later extracted separately using objcopy. it's similar to how the .itcm.c/cpp files currently work.

Firstly add an extra exclusion category to the main text segment.

Code:

   .text :   /* ALIGN (4): */
   {
      *(EXCLUDE_FILE (*.itcm* *.overlay*) .text)


Then later, after the .bss section we generate the overlays.

Code:

   __ewram_overlay_lma = __bss_lma + SIZEOF(.bss);

   OVERLAY ALIGN(4) : NOCROSSREFS AT (__ewram_overlay_lma)
   {
      .overlay0 { *(.overlay0) *.overlay0*(.text) . = ALIGN(4); }
      .overlay1 { *(.overlay1) *.overlay1*(.text) . = ALIGN(4); }
      .overlay2 { *(.overlay2) *.overlay2*(.text) . = ALIGN(4); }
      .overlay3 { *(.overlay3) *.overlay3*(.text) . = ALIGN(4); }
      .overlay4 { *(.overlay4) *.overlay4*(.text) . = ALIGN(4); }
      .overlay5 { *(.overlay5) *.overlay5*(.text) . = ALIGN(4); }
      .overlay6 { *(.overlay6) *.overlay6*(.text) . = ALIGN(4); }
      .overlay7 { *(.overlay7) *.overlay7*(.text) . = ALIGN(4); }
      .overlay8 { *(.overlay8) *.overlay8*(.text) . = ALIGN(4); }
      .overlay9 { *(.overlay9) *.overlay9*(.text) . = ALIGN(4); }
   }

   PROVIDE (__overlay0_size = SIZEOF(.overlay0));
   PROVIDE (__overlay1_size = SIZEOF(.overlay1));
   PROVIDE (__overlay2_size = SIZEOF(.overlay2));
   PROVIDE (__overlay3_size = SIZEOF(.overlay3));
   PROVIDE (__overlay4_size = SIZEOF(.overlay4));
   PROVIDE (__overlay5_size = SIZEOF(.overlay5));
   PROVIDE (__overlay6_size = SIZEOF(.overlay6));
   PROVIDE (__overlay7_size = SIZEOF(.overlay7));
   PROVIDE (__overlay8_size = SIZEOF(.overlay8));
   PROVIDE (__overlay9_size = SIZEOF(.overlay9));

   /* find the largest allocated overlay section */
   __ewram_overlay_alloc = MAX( SIZEOF(.overlay0) , SIZEOF(.overlay1) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay2) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay3) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay4) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay5) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay6) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay7) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay8) );
   __ewram_overlay_alloc = MAX( __ewram_overlay_alloc , SIZEOF(.overlay9) );

   /* if we use . here then _end points to overlay_lma + SIZEOF(overlay0) ... SIZEOF(overlay9) */

   _end = __ewram_overlay_lma + __ewram_overlay_alloc ;
   __end__ = __ewram_overlay_lma + __ewram_overlay_alloc ;
   PROVIDE (end = _end);


We then build the main arm9 binary without the overlay sections using something like

Code:

$(OUTPUT).arm9   :   $(OUTPUT).elf
   @$(OBJCOPY) -O binary -R .overlay0 -R .overlay1 ... $< $@
   @echo build ... $(notdir $@) minus OVERLAY


and extract the overlay sections separately with

Code:

$(OUTPUT).ovl0   :   $(OUTPUT).elf
   $(OBJCOPY) -O binary -j .overlay0 $< $@
   @echo build ... Overlay $(notdir $@)


It does need quite a bit more looking at but if we can get together something halfway sensible that works I'll consider adding support for this into devkitARM r22.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#147543 - keldon - Sat Dec 22, 2007 1:17 pm

Peter wrote:
keldon wrote:
Huh, why would your code and data be in RAM? If the data is declared as const then it will be stored in ROM

The Nintendo DS does not have ROM like the GBA, it's all (code, data, etc) located in main memory.

Oooh, good thing I read this topic then; the memory map is something I haven't really paid much attention to with the DS spec!

#147550 - sajiimori - Sat Dec 22, 2007 9:50 pm

Overlays saved me several hundred KB of RAM on my last project. I used to think they were hackish and lame compared to DLLs, but after finding the right abstraction for them in C++, they were both safe and transparent: using a class would automatically load its overlay and assert that no conflicting objects still existed.

#147576 - Peter - Sun Dec 23, 2007 9:47 pm

Hmm I don't really get the overlay.

When I use an overlay section, the code which is inside the overlay section can access symbols from the regular .text section and knows their addresses at link-time?

Code in the .text section does not know anything about symbols in overlays i guess. Would I have to map functions from an overlay like functions from a .dll file? I mean something like GetProcAddress and then use this pointer to call the function, or is the compiler/linker smart enough to magically resolve the symbol names from the overlay when I call one from the .text section?

I tried to find something about relocation tables, as it seems to be a must read, but couldn't find anything decent. Any recommendations?
_________________
Kind Regards,
Peter

#147581 - M3d10n - Sun Dec 23, 2007 10:23 pm

Linkscripts are still moonspeak for me so I might be very wrong, but I gotta ask:

Wintermute, does those changes you made work on per-file basis, like the ITCM stuff does by automatically dealing with .itcm named files? I'd just need to:

- Name my files foo.overlay0.cpp and bar.overlay1.cpp.
- Include foo.h and bar.h as usual.
- Make sure I load foo.ovl0 or bar.ovl1 into RAM before calling any functions defined in foo.h and bar.h, respectively.

Is that how it should work (something tells me it can't be that simple)? Being able to dynamically load/unload code would fit perfectly with the way I'm handling game state in my project: I could overlay my different game state loops and save memory by only having the current one loaded.

--EDIT--
Now that I look at it... the linkscript code seems to check the largest overlay code and reserves space for it, right? So I just load and copy the .ovl file I want to use at the same location in RAM? How do I know where to copy it to?
(BTW, you are god among men, wintermute!)

#147586 - chishm - Mon Dec 24, 2007 12:26 am

Peter:
The overlays are all compiled and treated like normal code, so they can call functions in the surrounding app and the app can call functions in them. The thing that makes them special is you have multiple functions all occupying the same location in memory, from the linker's perspective, but there can only be one set loaded at a time. It is your responsibility to make sure the correct set of functions is loaded before they are used.

M3d10n:
__ewram_overlay_lma should give you the start address of the place to load overlays, and __ewram_overlay_alloc will give you the size of the section. Simply load the overlays to there, taking care not to go past the end of the section.
The following should give you the correct address to load to (overlay_section). This is untested, so take with a grain of salt.
Code:
extern char __ewram_overlay_lma;
char* overlay_section = &__ewram_overlay_lma;

_________________
http://chishm.drunkencoders.com
http://dldi.drunkencoders.com