gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

OffTopic > why does nobody do ARM7 -> ARM9 run-time code generation?

#145699 - thedoc - Tue Nov 20, 2007 6:40 pm

hi. i am wondering why never anybody writes about generating code
at run-timg using the ARM7 as generator and letting the ARM9 execute it.

think of JiT, unrolling loops, using values instead of variables, etc.

why i'm asking? because i did that. functions outputing opcodes.
in freepascal, for example: add(_al,r1,r2,x); // x = value or register
some kind of run-time assembler, i guess.

this was actually quite easy. it took some time analyzing the opcodes,
but it was great fun, afterall.

so... am i the only one who does this stuff? (besides maybe emulator-developers)
_________________
what's the difference between a black-billed magpie. ??
both legs are equally long, especially the left one.

#145701 - Mighty Max - Tue Nov 20, 2007 6:52 pm

thedoc wrote:
so... am i the only one who does this stuff? (besides maybe emulator-developers)


Well, for algorithms which are not available at compiletime, there are script interpreter.

Selfmodifying code (writing cpu instructions in runtime for the runtime) is something to avoid imho. Just like GOTO statements, it makes code unneeded more complex, unreadable ,less verifyable and much less portable.
_________________
GBAMP Multiboot

#145704 - thedoc - Tue Nov 20, 2007 7:14 pm

so... did you write something in that direction?

besides... i am not talking about self modifying code, it's generated during runtime by the ARM7.
usefull for dynamic recompilation for, for example, a c64-emulator
which i've started, but which still contains bugs.

are these really the reasons nobody codes stuff like this? odd...
_________________
what's the difference between a black-billed magpie. ??
both legs are equally long, especially the left one.

#145708 - simonjhall - Tue Nov 20, 2007 8:14 pm

Does sound interesting, but the more I think about it the more it sounds like a complete world of pain! Here's my thoughts:
1) the '7 would have to emit whole functions worth of code for the '9 to run, since making single instructions for the other processor to run would have too much overhead (what with synchronisation and all, and the instruction cache).

2) this would mean that in order to not be a waste of time the '9 would have to be running some other (translated) code at the time. If it was idle, then there would be no point in using a second processor! This would then open up more synchronisation problems - you couldn't flush a function unless you knew exactly where the main processor was.

3) any 'systemmy' calls from the translated code would have to be made to the ARM7, which would incur overhead and might outweigh the whole multithreaded nature of it. Calls like "I've branched to a piece of code which doesn't exist yet - please generate it for me".

4) the two processors are not cache coherent. Granted, the ARM7 has no caching at all, but the ARM9 does. You'd either have to use the non-cached mirror of memory (slow-as) or keep flushing cache lines whenever you change any state in the main memory that the ARM9 requires. You've also got to flush the instruction cache when you add a new piece of translated code.

5) there's got to be more!

...however, big kudos if you can pull it off! But the amount of blocking/locking required and potential cache problems is pretty scary! Let us know how you get on :-D
_________________
Big thanks to everyone who donated for Quake2

#145826 - thedoc - Fri Nov 23, 2007 8:16 am

ok, now i've got time to reply.

1) that's true. it only works if the arm7 emits a whole bunch of opcodes for the arm9 to execute, but that is not really a problem, because there is enough code to start generating first and then let the arm9 execute it, so the arm7 has enough time to do the next block. think about a mandelbrot generator for example.

2) you're building this point upon point 1. there is no problem for making enough code so the arm9 executes, especially when reaching inner loops. btw... there are no loops or branches, this can all be handled by the arm7. it can be tricky, yes. but it's not a real problem. my unoptimized code reaches 550k to 650k+ emulated c64-opcodes per second on hardware, and thats far less what can be achieved, and that's with eventual halts in the arm9 because of jumps in the c64-code.

3) no. first, there is no need to branch for the arm9, secondly it's always possible to call stuff like "printf" for example. i have to admit, it didn't work from the beginning, but it works afterall.

4) depends. if you need to pass data between arm7 and arm9, this _could_ get tricky, but in the worst case one can use mirrored portions of main memory, which do not get cached. given that the arm7 can put the data into memory while the arm9 is still working on something else, there's no problem with performance (= no stalls).
i actually thought that the iCache would make problems... but actually, it isn't. in the worst case, there's the possibility of letting the arm9 jump to a different section of main memory, which is uncached, but that's not needed. i have tried changing code which is is ahead of r15 (but it got to be more than 8 bytes at least, afair) and it worked.

okay, that's a really long post, sorry for that. hope i didn't mix anything
up. and of course, excessive testing is still needed, but as far as what
i have done, it works quite well. i'd actually love to give a demo about
this, but there's no point, because you couldn't easily verify if the code
got generated on the fly.
_________________
what's the difference between a black-billed magpie. ??
both legs are equally long, especially the left one.

#145828 - Dwedit - Fri Nov 23, 2007 11:24 am

I've only used self modifying code on the GBA to do something inline which would otherwise have been done with a function pointer, and the "function" was a single instruction.

Otherwise, there are times for using it, and times for not using it. The best time to use it is when you would end up creating nearly identical blocks of code with a small difference. Another good time to use it is to change the destination of a branch instruction, but you don't really need that on the ARM because of the ability to do indirect jumps easily.

You should NOT use it if your code is going to end up on a rom chip, or if your code will be multithreaded in any way. Self modifying code is just as dangerous as Static variables from the perspective of multithreading.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#145829 - coolmos - Fri Nov 23, 2007 12:05 pm

Afaik dynamic recompilation is not the same as self-modifying code.

Dynarec recompiles complete functions for another processor.

Maybe you should ask some other dynarec emu writers for help? UltraHLE's epsilon comes to mind :-) Although i doubt he would like to cross Ninty again.

#145868 - Ant6n - Sat Nov 24, 2007 8:38 am

the real reason is probably that people who actually do JIT do it in java on fast multiprocessor systems and either write a shitload of papers about it or work for sun/ibm etc and make a shitload of mullah (or both). Firstly, java has many nice properties that make it good for jit'n'stuff, secondly it's applications are much broader than ... well, a just in time c64 recompiler on a mealy DS.
I been thinking/planning simple x86 emulation recompilation mixmasch, but in the end it's too much work and nobody would really be interested. If your potential target audience is that small, just writing a plain'ol' emulator might be more fun/rewarding.
Also notice that your compilers run on machines that are several orders of magnitudes faster than your runtime environment, and that your code is very specific for your machine, and there is no (fatlibs etc doesnt count) dynamic linking (all unlike in java), so the usefulness of dynamic compilation, optimization might be limited compared to static optimization, compilation or recompilation.
The idea is cool though.

#145870 - keldon - Sat Nov 24, 2007 9:28 am

Or you can also just run a VM for the purpose of scripting! Think, "Ye Mighty VM", the kick ass scripting back end.

#145871 - Ant6n - Sat Nov 24, 2007 10:00 am

..sure scripting... but a jitted one, on ds?
do scripts get generated on the DS?

#145934 - Exophase - Sun Nov 25, 2007 11:11 pm

Okay, there are three things you're possibly addressing here, all of which involve dynamic code generation:

- emulation (this is what you've been talking about), or dynamic recompilation of a low level language (like JVM)
- dynamic compilation of a high level language
- dynamic optimization (see the HP Dynamo paper)

You look like you were touching on a little of all of these. They're actually very different, even if they all involve code generation. The second two really are not ever going to have a good place on DS, so it's about emulation (I could tell you why I think they don't belong on DS but I'll wait to see if you're actually interested)

Lots of emulators use recompilers. This is usually done for high clock rate RISC CPUs, but some CISC ones have been recompiled too. For instance someone did a Spectrum ZX emulator using recompilation some years ago, for x86. Usually you don't need to do recompilation for a CISC platform because other components are more powerful than the CPU. Nonetheless, there are a lot of possible optimizations that can be done when recompiling a platform such as 6502 that just can't be done in an interpreter. I outline this here:

http://emutalk.net/showthread.php?t=42433

Having said that, there are some drawbacks to recompilation that make it less than ideal on DS. A big one is an overall lack of memory. A related one is lack of icache (ITCM isn't going to really help you here) and expensive miss penalties.

The real thing I'm wondering from all this is, why do you want to generate code on the ARM7? When you need code to be compiled you need it done as quickly as possible for things to work smoothly. You can speculatively look ahead and compile things in the background but this means that you need to put markers in the code you do compile to prevent it from going forward to the uncompiled code. This will result in slower generated code. The actual wins are not good because the cost of recompilation is not supposed to be a constant thing (if it is you might want to reconsider doing a recompiler). Even if the recompilation is very slow you'll reach a point where you're no longer compiling fresh blocks, that is, if you have enough memory allocated for it.

I think the ARM7 would be much better off doing something else, and even if there is nothing else it's not worth it to try to schedule compilation on a separate CPU, especially a slower one.