gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > MAssive performance differences between DeSmuME and no$gba

#173835 - JonP01 - Tue May 04, 2010 2:42 am

Hi,

I am currently working with a programmer on a DS homebrew project, updating a chess program. One of the features we have implemented is a "nodes per second" display, which efectively tells us how fast the program is executing through each iteration of it's search loop whilst it is thinking on it's move.

This was working fine in the no$gba emulator - or so I thought. When I run the program in no$gba, I will typically get a node count of ~8,000 per second.

I never thought anything untoward about this until I sent a copy of the latest build to an acquaintance to run on his Nintendo DSiXL console.

He then reported back to me that the program was working fine, but the nodes per second were roughly half of what I had been getting in the no$gba emulator.

I then set up a special test position so that we could objectively compare the differing speeds. And the result was as I was dreading - the DSiXL was running about 40% the speed of the no$gba emulation.

I then downloaded the DeSmuME emulation and got yet another result - around 2,700 nodes per second - so even slower than the DSiXL.

I am now starting to suspect that the no$bga emulation is actually running this program at DSi speeds (i.e 133 Mhz) isntead of DS speeds (67 Mhz). Can anyone confirm this is possible? I don't see any way to adjust the no$gba emulator to run in DS mode versus DSi mode, but in all comparisions I have made, the processing speed of the no$gba emulator has even exceeded the speed of a 100 Mhz ARM processor with far larger hash tables (the same chess engine we are using was benchmarked on a 100 Mhz ARM with 12 MB RAM hash).

Incidentally, I should add that in all cases, the emulations have been running at 100% speed all the time, though because our program has it's own internal clock, it actually wouldn't make any difference anyway - the seconds would just go by more slowly, but the nodes per second would remain consistent.

So I now have three different levels of performance:

DeSmuME ~2,700 nps
no$gba ~ 8,000 nps
DSiXL console ~ 3,500 nps

Which one can I trust? You might say the console itself, but maybe the flashcard is doing something to make it run at the speed it does. But then there is the evidence that the no$gba emulator is outperforming a 100 Mhz ARM with much larger useable RAM (the RAM is used for hash tables and enables a chess program to find the solution to a problem faster).

So far as I can tell, everything is pointing to the console being "right" and the two emulations being respectively too fast and too slow.

#173836 - Pete_Lockwood - Tue May 04, 2010 3:09 am

The simple answer is that NO$ development ceased before the DSi existed so NO$ isn't intentionally doing anything but trying to emulate a vanilla DS. Last I saw (admittedly some time ago) DeSmuME was pretty slow, it wouldn't surprise me that it can't run at real speeds.

Pure guess but it's probably hard for NO$ to emulate the DS at 100% perfect speeds because you're talking about very fine slices of time that it would need to match. I'd be surprised if the card has any affect on node evaluation time as that's all highly CPU-intensive and not using the card at all. The only case where the card comes into play is reading from the book - and there any teeny slowdown is a tiny payoff vs. getting the move "for free".

Thinking further, it's possible games appear to run closer to 100% speed because they probably rely quite heavily on doing stuff based on timers, which provide a natural resynchronization point - NO$ could be ahead of where the DS would be but then has to wait for the same timer the DS waits for before performing the next action.

That's my 1/2c.
_________________
It's not an illusion, it just looks like one.

#173837 - JonP01 - Tue May 04, 2010 3:26 am

Hi Pete,

Well it is most likely back to the drawing board for me in relation to all those "game in" levels I carefully calibrated. I am going to have to come up with new ones - they have all been completely thrown out of whack by the different emulator and thus I will have to calibrate those on the actual console only.

You know that strange thing I noticed where the program would move inordinately slowly those first 7 - 8 moves on a "game in" level? I put this odd behaviour down to an oddity in the engine, but it now seems to be related to the emulator, because on the new emulator, the program is reponding in a timely manner (3 minutes in the new emulator versus 10 minutes on the no$gba). Those problems are simply non-existent on the other emulator I am now trying, so they might possibly be gone on the actual console too.

At least once I get the flashcard I can go about the taks of re-specifying those levels and will also have the benefit of knowing exactly how much this program chews on the batteries too.

#173838 - Dwedit - Tue May 04, 2010 3:42 am

NO$GBA doesn't emulate the cache, it emulates memory as if it's always cached. If you need faster memory, use it instead of the slow main memory.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#173840 - Pete_Lockwood - Tue May 04, 2010 3:50 am

3.25Mb of cache would be nice but I don't think we'll convince the DS ;)
_________________
It's not an illusion, it just looks like one.

#173844 - sverx - Tue May 04, 2010 9:54 am

Usually no$gba is quite accurate but I also noticed sometimes it performs better than my DSlite. As Dwedit already said no$ doesn't emulate caches so here's one of the reasons for these differences, then there are also a lot of glitches that affects the way the ARM9 works/access the memory (you can read more details about that on Martin's gbatek) and of course no$gba doesn't emulate them.

Of course you don't have 3.25MB of cache, but you've got ITCM, DTCM and WRAM accessible to the ARM9 (32KB), you'll probably be able to speed up things by writing a 'portion of your work' in the WRAM and access that... also having parts of the most frequently used code allocated in the ITCM (32KB total) will help a lot.