gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Nester ported to DS

#46545 - MrAdults - Mon Jun 27, 2005 5:02 am

I just got into DS development earlier this week, so I'm a little new to everything still. But I thought this might be of interest to some of you here, particularly because I have included the source code for the project in my distribution. You can get it here:

http://www.telefragged.com/thefatal/index.php?content=inc_si_ds.htm

I ran into some interesting problems while getting this all put together. The main thing was the fact that the default ds_arm9 linker script in the current version of devkitARM seems to be putting all .data in dtcm, and that floods over really quick, especially if you have a lot of memory in global variables. I changed the linker script to put .data in ewram instead and had no troubles with that, but I'm curious if anyone else has run into the same issue. It took me a while to figure it out from the strange symptoms, and the linker was quiet about it probably because no sequential memory regions were defined for it to report flooding. I assume the actual issue in behaviour came up with the code in the ds_arm9 runtime stub that copies that data into different regions without looking at the size. Or maybe it was all caused by me doing something stupid that I never caught.

Anyway, hope people find it useful. I'm entirely fine with it if anyone wants to take the code that's there and run with it. I'll probably be working on it some more in the future, mainly to get the APU stuff functional, but I'm also fairly busy at work these days.

-Rich

#46548 - Dwedit - Mon Jun 27, 2005 5:32 am

What's the framerate like?
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#46550 - MrAdults - Mon Jun 27, 2005 6:20 am

The cpu is able to run at an ideal rate typically with a frameskip of 3 or so. The routine for writing into the framebuffer is a bit hacked together and there is a lot of room for improvement. The port has a few other deficiencies that I mention in the readme, and I'm using a trimmed down version of the standard scanline rendering loop to speed things up, which in turn does break a few games (I think it's mainly an issue for mapper4). Those that do work (I've been using Contra for testing primarily) run at a completely playable framerate.

-Rich

#46561 - tepples - Mon Jun 27, 2005 1:08 pm

MrAdults wrote:
The routine for writing into the framebuffer is a bit hacked together and there is a lot of room for improvement.

In other words, it doesn't use PocketNES style video virtualization, right?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#46565 - IxthusTiger - Mon Jun 27, 2005 2:16 pm

for an emulator with, as you say, not a lot of optimization, it runs pretty f***ing fantastically!

#46566 - MrAdults - Mon Jun 27, 2005 3:01 pm

tepples: Correct, it isn't mapping NES sprite memory to DS sprite memory or anything fancy like that. I'm pretty pleased with the DS hardware's ability to run it acceptably as-is, because I'm just letting nester fill up its own buffer and then copying that complete buffer into the framebuffer. There is huge optimization potential right there to modify nester's ppu code to write directly into the DS framebuffer.

For the APU I'd also like to move that almost entirely to the arm7, as I'm doing pretty much nothing on the arm7 as is and it's necessary to perform sound-related hardware operations there anyway. It will be an interesting problem to further abstract that to the level of being able to run it there, but well worth the trouble I'm sure.

-Rich

#46568 - tepples - Mon Jun 27, 2005 3:15 pm

MrAdults wrote:
I'm just letting nester fill up its own buffer and then copying that complete buffer into the framebuffer. There is huge optimization potential right there to modify nester's ppu code to write directly into the DS framebuffer.

Although that might not always be desirable. You'll eventually want to do some sort of linear-interpolated scaling, though I have some ideas on how that could be done really fast with two background layers + alpha blending.

Quote:
For the APU I'd also like to move that almost entirely to the arm7, as I'm doing pretty much nothing on the arm7 as is and it's necessary to perform sound-related hardware operations there anyway. It will be an interesting problem to further abstract that to the level of being able to run it there, but well worth the trouble I'm sure.

What would be really cool is if we could figure out how to access the Nintendo DS tone generators so that we can virtualize at least some of the sound generation, leaving valuable time for graphics.

Or you could run the CPU core on the ARM7 using the PocketNES engine.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#46569 - strager - Mon Jun 27, 2005 3:23 pm

I have tested it with Metroid, and it works fantastic. Metroid runs OK with frameskip around 4 or 5, but the problem with that is sometime the energies, etc. are invisible (which is a problem for me...). But with a couple of (posibbly) ASM optimizations, I wouldn't be surprised if it runs more than fast enough.

You might also want a use for the Y button; maybe turbo?
And we all need a reset function. :-D

Jood job, and keep up the good work.

#46573 - Extreme Coder - Mon Jun 27, 2005 4:00 pm

You can't know how happy I am to see the first playable emulator on the DS!:P
Its pretty good, speed is okay, just add sound, then optimize.
A few suggestions:
-Make buttons X and Y for autosaving/autoloading.
- Put frameskip, other options on the touch screen.
- The ability to resize the screen.
Keep up the good work.

#46579 - wintermute - Mon Jun 27, 2005 5:03 pm

MrAdults wrote:


I ran into some interesting problems while getting this all put together. The main thing was the fact that the default ds_arm9 linker script in the current version of devkitARM seems to be putting all .data in dtcm, and that floods over really quick, especially if you have a lot of memory in global variables.


It's an embedded system, using large global variables is an extremely bad idea, especially on a multiprocessor system.

Quote:

I changed the linker script to put .data in ewram instead and had no troubles with that, but I'm curious if anyone else has run into the same issue.


Another bad idea - modifying the compiler so that bad programming practices work :P

Quote:

It took me a while to figure it out from the strange symptoms, and the linker was quiet about it probably because no sequential memory regions were defined for it to report flooding. I assume the actual issue in behaviour came up with the code in the ds_arm9 runtime stub that copies that data into different regions without looking at the size.


The DS crtls in devkitARM releases 12 onwards error when memory regions are full.

dtcm was chosen for data for several reasons.

dtcm is 32bit zero wait state

the shared ram (ewram) is subject to bus arbitration and the ARM9 appears to have the lowest priority of the devices which access this memory. dtcm is only accessible by the ARM9 and is not subject to bus arbitration.

If you really, really need large arrays you should either malloc them or use EWRAM_BSS which places the variable in main memory.

Code:

char BigArray[16384]; // goes in dtcm
char BigArray[16384] EWRAM_BSS; // goes in ewram

#46580 - MrAdults - Mon Jun 27, 2005 5:03 pm

tepples: I'd like to stay away from virtualization methods like that, because it kind of defeats the point of doing everything the Right Way. I'm also not against assembly optimization for particularly expensive routines, although the reason I chose Nester for this was because it's written entirely in C/C++ and is very well written from an ease of use and readability standpoint. I'd like to keep a balance between speed and elegance, as much as the computational boundaries will permit. This also means the emulator will be extendable to extra mappers and things by the common coder and not just people who are extremely well versed in ARM assembly. Which means, in the end, greater compatibility and less bugs. From an architectural standpoint I think this is the right approach, at least for an opensource project like this.

The eventual goal of linear resampling was a consideration for my implementation decision to not dig into the ppu code and rather use the existing nester wrapper functionality to keep a software buffer and write to the framebuffer manually. I can think of some ways to resize individual objects while writing them to the framebuffer in the ppu logic, but that's a bit too specific for my tastes. That's something I may contemplate once the essentials are more functional.

Regarding the APU, also want to stay away from virtualization. I think the arm7 processor in the DS is plenty fast to emulate the rather accurate existing nester APU core, and if the arm7 isn't being devoted to anything else, I don't see a reason not to. Plenty of extra computational power in there that's just going to waste right now.

-Rich

#46582 - MrAdults - Mon Jun 27, 2005 5:18 pm

wintermute: I expected something along these lines, but there was definitely no error in the runtime from dtcm being "full". I looked at the runtime code and did not see anything that would be checking for an overflow, what am I missing in there? I'm using the runtimes included with R13. I specifically stepped through all of that code at runtime as well and it just executed as usual and happily stepped into my main() function. Like I said, the linker will error if it thinks you're stepping into another memory segment, but there's nothing defined for a bit after dtcm despite the actual space in hardware being rather small.

My current plans were/are to only move specific data (NES CPU variables in particular) to dtcm and just leave the rest in ewram. I see no downside in putting globals there that do not need to be referenced on a more-than-per-frame basis.

-Rich

#46589 - tepples - Mon Jun 27, 2005 7:11 pm

wintermute wrote:
If you really, really need large arrays you should either malloc them or use EWRAM_BSS which places the variable in main memory.

Code:

char BigArray[16384]; // goes in dtcm
char BigArray[16384] EWRAM_BSS; // goes in ewram

But wouldn't something in .sbss overlap the ROM files that are appended to the program? Or does BSS grow down from the top of the 4 MB EWRAM?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#46590 - sajiimori - Mon Jun 27, 2005 7:33 pm

Here's another vote for EWRAM globals by default. Start with the safe thing, then change to the fast thing when necessary.

If globals are teh evil, then we won't care where they go by default because we don't use them. They may as well go in EWRAM so there are fewer posts from beginners asking why they're running out of memory (or getting memory corruption if you're putting the stack in DTCM).

#46591 - ErUs - Mon Jun 27, 2005 8:01 pm

yeah this is pretty good.

i am about 80% finished porting the source engine to the ds and about 50% with the doom3 engine.

it uses my dx9 emulation system and runs about 200 times faster then a decent spec pc.

look out for my DS overclock hardware that turns your ds into a 5ghz monster ;)

#46594 - MrAdults - Mon Jun 27, 2005 8:37 pm

Quote:
Here's another vote for EWRAM globals by default. Start with the safe thing, then change to the fast thing when necessary.

If globals are teh evil, then we won't care where they go by default because we don't use them. They may as well go in EWRAM so there are fewer posts from beginners asking why they're running out of memory (or getting memory corruption if you're putting the stack in DTCM).


This is one way of dealing with things, but I would have been happy if the runtimes (or more ideally the linker) had yelled at me about this so that I did not have to dig down into things to figure out the issue. At the point the runtimes would be checking there isn't much way to give the user feedback, but even if it stopped execution or went into a special loop, I would have seen the problem immediately when I went to step through it in a debugger. I had foolishly assumed that if something so vital were going wrong I would surely see an error about it somewhere, which left me debugging my own code needlessly for some time.

Unnecessary logic in an integral library is never an ideal thing either, but it's only a startup check. Special debug runtimes would probably be the correct way to go, and would allow more general padding and developer friendliness all around. Not really critisizing here, though, I would expect a lack of bells and whistles like this given the early state of things. Just giving some feedback with the impressions I've had from just diving into things.

That said, yeah, if you know what you're doing then you know where to put memory to optimize things. If you don't know what you're doing, your stuff is going to run slower, but it can at least work for people who just want to make trivial things and not worry about where their globals are going.

-Rich

#46600 - wintermute - Mon Jun 27, 2005 10:15 pm

looks like something has changed with binutils 2.16.1 :/

release 12 did complain when the section overflowed, r13 appears not to. I'll look into it.

#46615 - wintermute - Tue Jun 28, 2005 3:08 am

linker now complains when bss overflows, updated linkscripts have been uploaded.

http://devkitpro.sourceforge.net/

#46656 - ethoscapade - Tue Jun 28, 2005 6:12 pm

awesome.

now let's port pocketnes to ds =)

map B to Y, A to B, and use the DS' vertical resolution so scaling is less necessary, and you have perfect NES emulation.

#46659 - tepples - Tue Jun 28, 2005 8:23 pm

ethoscapade wrote:
map B to Y, A to B, and use the DS' vertical resolution so scaling is less necessary, and you have perfect NES emulation.

Better yet, use the α-Lerp technique that I got working on GBA, and scaling will look perfect.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

#46873 - TheChuckster - Fri Jul 01, 2005 4:06 pm

PocketNES is all in assembly. Should be very portable though.

#46879 - ethoscapade - Fri Jul 01, 2005 6:35 pm

i'll be happy when we can feed all GBA code through DS mode and take care of all the ridiculous things that nintendo left out (wireless multiplayer, map the buttons properly for chrissakes, B->Y and A->B).

#46969 - TheChuckster - Sat Jul 02, 2005 8:13 pm

That likely won't happen.