#113234 - goruka - Sun Dec 24, 2006 11:33 pm
I still haven't messed with the DS enough, but before wasting a lot of time benchmarking, I wanted to do some questions about the opengl-like registers of the DS.
1- How fast is the internal matrix multiplication? I noticed you can push/pop matrices and multiply them (4x4/4x3/3x3), is this much faster than doing it on ARM? (pushing matrix stack, loading a matrix, mutiplying by another and pop the result)
2-How's the SQRT function (speed/accuracy)? any idea of the error?
Well I guess these are my main doubts so far.
Cheers and Thanks!
#113247 - HyperHacker - Mon Dec 25, 2006 9:06 am
#1, I'd imagine it's faster, or else why would it be there?
_________________
I'm a PSP hacker now, but I still <3 DS.
#113252 - OOPMan - Mon Dec 25, 2006 9:41 am
With regards to the SQRT function, I can't say for sure but I'd guess the DS not having an FPU would slow that one down...
In general, avoid using SQRT when possible. This can be quite easy to do in many situations...
For example:
x^2 + y^2 = r^2
You want to determine whether some random line with a length of a is longer than the above one. You could take the square root of the above r^2 and compare r and a directly, but this is inefficient.
It's actually better to compare r^2 to a^2. By doing so you can determine the which line is longer without resorting to SQRT.
While such trickery does not apply to every single possible situation out there, basic 1st year university maths illustrated that it applies a lot more often than you'd imagine and can make a difficult problem very easy :-)
So, try to avoid SQRT where possible ;-)
_________________
"My boot, your face..." - Attributed to OOPMan, Emperor of Eroticon VI
You can find my NDS homebrew projects here...
#113340 - Juglak - Tue Dec 26, 2006 9:33 pm
Not sure on the matricies... but...
Code: |
40002B0h - NDS9 - SQRTCNT - 16bit - Square Root Control (R/W)
0 Mode (0=32bit input, 1=64bit input)
1-14 Not used
15 Busy (0=Ready, 1=Busy) (Execution time is 13 clks, in either Mode)
40002B4h - NDS9 - SQRT_RESULT - 32bit - Square Root Result (R/W?)
40002B8h - NDS9 - SQRT_PARAM - 64bit - Square Root Parameter Input (R/W)
Unsigned 64bit parameter, and unsigned 32bit result.
|
Quote: |
Execution time is 13 clks
|
If I'm not mistaken, though, libnds does NOT take advantage of these accelerated registers for sqrt...
-J
Ref: http://nocash.emubase.de/gbatek.htm
_________________
My goodies: 1xDS Lite - Supercard Lite, DSi, Supercard DSONEi
#113344 - Lick - Tue Dec 26, 2006 10:03 pm
You can access the SQRT hardware with the registers defined in nds/arm9/math.h. Found out today.
_________________
http://licklick.wordpress.com
#113362 - kusma - Wed Dec 27, 2006 1:41 am
Lick wrote: |
You can access the SQRT hardware with the registers defined in nds/arm9/math.h. Found out today. |
How do you guarantee that the result is ready when you read it? I'm suspecting this functionality to be hard to take advantage of from C/C++ code, and thus the only big advantage being eliminated unless you're writing assembly-code to manually parallelize code. And let's be practical here: we're only writing inner-loops in assembly, and what kind of inner-loops require normalizing? I can only think of a few, and most of them can be "tricked" away using sleazy cheats.
#113363 - Lick - Wed Dec 27, 2006 1:53 am
This?
while(SQRT_CR & SQRT_BUSY);
_________________
http://licklick.wordpress.com
#113364 - kusma - Wed Dec 27, 2006 2:06 am
Lick wrote: |
This?
while(SQRT_CR & SQRT_BUSY); |
Yeah, okay, This doesn't tell the scheduler to put 13 clocks between writing the registers and reading them, though. And the gcc scheduler isn't too friendly to these kinds of ("manual") stalls...
#113369 - tepples - Wed Dec 27, 2006 3:02 am
Looks like a job for inline asm, right?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.
#113370 - kusma - Wed Dec 27, 2006 3:06 am
tepples wrote: |
Looks like a job for inline asm, right? |
Good point. That way I guess you can force some dependencies to make the read and write distant enough. Still, it's a rather ugly hack, and it reduce the schedulers ability to generate optimal code.
#113372 - sajiimori - Wed Dec 27, 2006 3:21 am
Better yet, do real work in the meantime.
#113373 - kusma - Wed Dec 27, 2006 3:27 am
sajiimori wrote: |
Better yet, do real work in the meantime. |
But how do you control the order of instructions in an program that has been transformed to SSA-form before scheduling?