gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

ASM > asm no jutsu!

#6019 - lordmetroid - Thu May 15, 2003 1:22 pm

how good is gcc to compile?
I have been thinking of coding some assembler for my 3D engine to gain speed, however as I never have been coding assembler before (though know how it works and all the essentials to just jump in and start code)
I wonder if it will be any idea to code asm, should I be able to outdo the compiler?

P.S seems from the topic I have been watching to much Naruto, ne?
_________________
*Spam*
Open Solutions for an open mind, www.areta.org

Areta is an organization of coders codeing mostly open source project, but there is alot of sections like GBA dev, Language learning communities, RPG communities, etc...

#6021 - Torlus - Thu May 15, 2003 2:31 pm

If you want to gain speed, you need to come to assembly.
When I work on my 3D engine, I usually write a C function to test an algorithm, then re-write (and re-think) it to ARM assembly.
I usually get then a speed increase of 2x or 3x.
And with that lovely ARM instruction set, it is not as painful as one could believe.
_________________
GBA,GC,NGPC,GP32,FPGA,DS stuff at http://torlus.com/

#6034 - torne - Thu May 15, 2003 6:26 pm

You will be able to beat the compiler if you use ARM instructions, certainly. =)
Beating it at Thumb code generation is a little more challenging, as the Thumb instruction set is restricted to a point where all my favourite tricks don't work. I've had to come up with a new set, most of which abuse the system impressively. I can still get 66% code size and 120% speed or more by hand-coding Thumb.

If all your speed-critical routines are small enough to fit in IWRAM where 32-bit code is no problem, then you'll have no trouble. If you need to use 16-bit code, you will need to learn whole new sets of tricks that don't apply to other asm dialects.

Mind you, I *like* ASM, and am writing substantial portions of my code in it for no real reason other than I can. =)

Torne

#6042 - lordmetroid - Thu May 15, 2003 7:36 pm

yes, one could code asm really good...
But as I said I have never coded asm before, will I bea able to outdo the compiler?
Torlus: That's exactly what I have been thinking to do, code in C first.
_________________
*Spam*
Open Solutions for an open mind, www.areta.org

Areta is an organization of coders codeing mostly open source project, but there is alot of sections like GBA dev, Language learning communities, RPG communities, etc...

#6059 - Torlus - Fri May 16, 2003 9:16 am

"good" ASM coding is very different from C coding.
I mean, if you convert basically your C algorithm and code into assembly, you may not outdo the compiler. For good performance, you will have to re-think data storage and algorithms.

Personally, I often have to re-think and modify my ASM code and data structures two or three times before getting a well-optimized version.

If you're new to ASM, you may find it more difficult, but with some practice, and especially with ARM assembly, you will find that what can take many lines of code in C could be replaced with one instruction.
Addressing modes in ARM are very powerful, and some instructions (like MLA) are very good for matrices operations needed in 3D programming.

I then recommend you not to hesitate to go into assembly, as it will not be too hard to learn (ARM has few opcodes).
_________________
GBA,GC,NGPC,GP32,FPGA,DS stuff at http://torlus.com/

#6076 - DekuTree64 - Fri May 16, 2003 4:30 pm

Actually you can use smull to mutliply 2 16.16 fixed point numbers and then just put the upper 32 bits of the result in the register you want and you don't have to do any shifting at all. And smlal can do the same, but adding to the previous result at the same time, which is just beautiful to see how well it works for matrices^^ And you get 64-bit precision too.

And when you're doing a lot of things with the same variable (particularly in loops), it makes a huge difference to keep it in a register until you're finished, so you only have to load/store it once. It can be pretty hard to come up with the best optimizations with 13-15 (depending on if you use the sp and lr) registers to work with, but it's a lot of fun, and much more gratifying than C code when you get something just right.