gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > Trying to understand what happens during the build process

#168488 - Tyler24 - Thu Apr 30, 2009 5:35 pm

So, I'm embarking on a project this summer and I really want to understand the internals of the DS, and how to create C code for it, knowing those internals. Right now, I have a basic arm9.c and arm7.c. Both put themselves into infinite while loops as soon as possible, but arm9.c also enables one of the LCDs (effectively turning all the pixels black) so I know that my binary is working properly.

However, during the compiling process, I specified -specs=ds_arm9.specs and -specs=ds_arm7.specs, respectively. Can somebody give me an idea of what these specification files say, and what they link to? I tried opening them and poking around but didn't really understand much. Are these specification files just mapping out IWRAM and such so that the programmer can specify what methods to copy to fast memory before the main method gets called?

The working NDS file I have built now consists of ~3kb or so of data, so there's got to be something in there...

Edit: I see that ds_arm*_crt0.s and ds_arm*_crt0.o is being included into my program. I'm assuming these are the only 4 files? crt0.o doesn't look like it has any data, but crt0.s looks like it tells the DS where it's basic memory locations are and initializes them... or something.

#168489 - Dwedit - Thu Apr 30, 2009 7:40 pm

crt0.o is just a compiled version of crt0.s
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#168491 - sajiimori - Thu Apr 30, 2009 9:04 pm

Well, for starters, the DS has a protection unit that needs to be configured at startup to determine your memory layout and read/write permissions -- that's typically one of the first things that needs to happen, in crt0. I suspect that's what you're seeing there.

#168492 - Tyler24 - Fri May 01, 2009 12:39 am

Alright, thanks for the responses.

I've pretty much copied the entire crt0.s file, just rewrote it and made a few small changes in terms of where addresses are and stuff. The crt0.s file and it's addresses differ from where Martin Korth suggested, based on where the retail ROMs place them...

Anyways, this didn't break anything so I'm assuming it's working. It's kind of hard to determine if the cache and whatnot is working probably now, but that should be evident in the future anyways.

Still haven't implemented stack addresses and whatnot needed for C, but after looking at how inefficient a lot of compiled C code is, I might just go full blown assembly. Maybe I'll change my mind after I realize how slow progress is :P.

arm9_main.s:
Code:
@ --------------------------------------------------------------------------------
@  main.s - Setups the protection unit, initializes the system.
@ --------------------------------------------------------------------------------



@ --------------------------------------------------------------------------------
@  Crucial assembler information
@ --------------------------------------------------------------------------------
.arch armv5te
.cpu arm946e-s
.align 4
.arm



@ --------------------------------------------------------------------------------
@  Global declarations
@ --------------------------------------------------------------------------------
.global arm9_main



@ --------------------------------------------------------------------------------
@  Program entry point. Clear caches, define ITCM/DTCM, setup protection unit.
@  This code is used courtesy of Sasq; the addresses have been slightly modified.
@ --------------------------------------------------------------------------------
arm9_main:

 @ ----------------------------------------
 @  Clear instruction and data cache
 @  Then wait for write buffer to empty
 @ ----------------------------------------
 mov r0, #0x00
 mcr p15, 0, r0, c7, c5, 0
 mcr p15, 0, r0, c7, c6, 0

 mcr p15, 0, r0, c7, c10, 4

 @ ----------------------------------------
 @  Tell the co-processor where our
 @  instruction and data memory caches
 @  are located.
 @
 @  DTCM = 0x0B000000, size = 16kb
 @  ITCM = 0x00000000, size = 32kb
 @ ----------------------------------------
 ldr r0, =0x0B00000A
 mcr p15, 0, r0, c9, c1, 0

 mov r0, #0x00000020
 mcr p15, 0, r0, c9, c1, 1


@ --------------------------------------------------------------------------------
@  Setup the protection unit similar to how it is for commercial cartridges.
@  There are some slight modifications based on what genuine Nintendo cartridges
@  use. This table was pulled directly from Martin Korth's documentation.
@
@  I have not bothered to check/verify R/W-ability for most sections...
@ --------------------------------------------------------------------------------
@
@  Region     Name            Address   Size   Cache WBuf
@  -          Background      00000000h 4GB    -     -
@  0          I/O & VRAM      04000000h 64MB   -     -
@  1         Main Memory     02000000h 4MB    On    On
@  2         ARM7-Dedic.   027C0000h 256KB  -     -
@  3          GBA Slot        08000000h 128MB  -     -
@  4         DTCM            027C0000h 16KB   -     -
@  5         ITCM            01000000h 32KB   -     -
@  6          BIOS            FFFF0000h 32KB   On    -
@  7          Shared Mem      027FF000h 4KB    -     -
@ --------------------------------------------------------------------------------

#define PAGE_4K         (0b01011 << 1)
#define PAGE_8K         (0b01100 << 1)
#define PAGE_16K      (0b01101 << 1)
#define PAGE_32K      (0b01110 << 1)
#define PAGE_64K      (0b00111 << 1)
#define PAGE_128K      (0b10000 << 1)
#define PAGE_256K      (0b10001 << 1)
#define PAGE_512K      (0b10010 << 1)
#define PAGE_1M         (0b10011 << 1)
#define PAGE_2M         (0b10100 << 1)
#define PAGE_4M         (0b10101 << 1)
#define PAGE_8M         (0b10110 << 1)
#define PAGE_16M      (0b10111 << 1)
#define PAGE_32M      (0b11000 << 1)
#define PAGE_64M      (0b11001 << 1)
#define PAGE_128M      (0b11010 << 1)
#define PAGE_256M      (0b11011 << 1)
#define PAGE_512M      (0b11100 << 1)
#define PAGE_1G         (0b11101 << 1)
#define PAGE_2G         (0b11110 << 1)
#define PAGE_4G         (0b11111 << 1)

 @ ----------------------------------------
 @  Region 0: I/O Registers
 @ ----------------------------------------
 ldr r0, =(0b11001 << 1 | 0x04000000 | 1)
 mcr p15, 0, r0, c6, c0, 0

 @ ----------------------------------------
 @  Region 1: Main Memory
 @ ----------------------------------------
 ldr r0, =(0b10101 << 1 | 0x02000000 | 1)
 mcr p15, 0, r0, c6, c1, 0

 @ ----------------------------------------
 @  Region 2: ARM7 Dedicated
 @ ----------------------------------------
 ldr r0, =(0b10001 << 1 | 0x027C0000 | 1)
 mcr p15, 0, r0, c6, c2, 0

 @ ----------------------------------------
 @  Region 3: GBA Slot
 @ ----------------------------------------
 ldr r0, =(0b11010 << 1 | 0x08000000 | 1)
 mcr p15, 0, r0, c6, c3, 0

 @ ----------------------------------------
 @  Region 4: DTCM
 @
 @  Base must be size-aligned, so
 @  have to do lsr #15, lsl #15
 @ ----------------------------------------
 ldr r0, =(0b01101 << 1 | 0x027C0000 | 1)
 mcr p15, 0, r0, c6, c4, 0

 @ ----------------------------------------
 @  Region 5: ITCM
 @
 @  Base must be size-aligned, so
 @  have to do lsr #15, lsl #15
 @ ----------------------------------------
 ldr r0, =(0b01110 << 1 | 0x01000000 | 1)
 mcr p15, 0, r0, c6, c5, 0

 @ ----------------------------------------
 @  Region 6: BIOS
 @ ----------------------------------------
 ldr r0, =(0b01110 << 1 | 0xFFFF0000 | 1)
 mcr p15, 0, r0, c6, c6, 0

 @ ----------------------------------------
 @  Region 7: Shared Mem
 @ ----------------------------------------
 ldr r0, =(0b01011 << 1 | 0x027FF000 | 1)
 mcr p15, 0, r0, c6, c7, 0

 @ ----------------------------------------
 @  Determine which regions have wr-buffer
 @  Determine which regions are cached
 @ ----------------------------------------
 ldr r0, =0b00000010
 mcr p15, 0, r0, c3, c0, 0

 ldr r0, =0b01000010
 mcr p15, 0, r0, c2, c0, 0
 mcr p15, 0, r0, c2, c0, 1

 @ ----------------------------------------
 @  Set extended access permission regions
 @  for instruction and data accesses
 @
 @  Bits   Privileged  User
 @  0000   --       --
 @  0001   R/W       --
 @  0010   R/W       R
 @  0011   R/W       R/W
 @  0100   UNP       UNP
 @  0101   R       --
 @  0110   R       R
 @  0111   UNP       UNP
 @  1XXX   UNP       UNP
 @ ----------------------------------------
 ldr r0, =0x36636633
 mcr p15, 0, r0, c5, c0, 3

 ldr r0, =0x36333633
 mcr p15, 0, r0, c5, c0, 2

 @ ----------------------------------------
 @  Enable ITCM/DTCM and their caches
 @ ----------------------------------------

#define ITCM_ENABLE   (1<<18)
#define DTCM_ENABLE   (1<<16)
#define ICACHE_ENABLE   (1<<12)
#define DCACHE_ENABLE   (1<<2)
#define PROTECT_ENABLE   (1<<0)

 mrc p15, 0, r0, c1, c0, 0
 ldr r1, = 1<<18 | 1<<16 | 1<<12 | 1<<2 | 1<<0
 orr r0, r0, r1
 mcr p15, 0, r0, c1, c0, 0

lcd_enable:
 mov r0, #0x04000000
 mov r2, #0x00020000
 mov r3, #0x80

 str r2, [r0]
 str r3, [r0, #0x240]

freeze:
 bl freeze

#168497 - sajiimori - Fri May 01, 2009 7:39 am

That's cool -- assembly is fun, especially on ARM. You'll probably get a lot of people telling you to "optimize last", or to use C++ and only hand-code a few small sections, but such advice may very well be missing the point -- it really depends on your goals.

Much of our codebase compiles down to some pretty crappy assembly... but efficiency is just one of many concerns to balance. Each hour I spend optimizing the engine is an hour I could've spent doing something else: improving the camera behavior, adding some interesting new AI interactions, making it quicker for artists to preview their work on target, making the object scaling widget easier to use in the level editor, having the build process autogenerate a file that coders have been maintaining by hand, generalizing a game-specific module so it can be used on other projects...

And those are just the self-motivated tasks; I'm ignoring the need to actually implement features on a schedule, make modifications based on constant design changes, and support other projects who are using your code.

#168505 - Tyler24 - Fri May 01, 2009 11:03 pm

Oh, didn't mean to discredit you by any means... I had to check what darn near every line was doing up in the code about because it was all interacting with the coprocessor... and I have no idea what c0...c15 do off the top of my head. I just hate inefficient things. I can't stand languages like Java, C#, etc. that just suck up memory bandwidth like nobody's business. I decided to switch over to assembly when I saw how much stack pushing and popping was going on when different methods were called... considering each memory cycle requires 8 wait-states (don't quote me on that, I remember seeing it somwhere) or so, multiply that by 8 things to pop and push (*2) and you get 128 cycles wasted for every method call.

I'm really a beginner as far as the NDS hardware goes, but I'm halfway decent at assembly so I thought I'd give it a go. The GBATek and bottledlight sites have been a lifesaver.

#168507 - Dwedit - Sat May 02, 2009 12:36 am

The stack is usually placed in fast DTCM memory, so there is no 8 cycle penalty for writes and reads.
Also, the cache works wonders to hide the memory access penalties, since the cache is hit far more often than it is missed.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#168511 - sajiimori - Sat May 02, 2009 1:53 am

I was just thinking aloud -- I didn't think you were trying to discredit me. ^^