#177103 - blessingta@hotmail.co.uk - Sun Dec 11, 2011 12:56 pm
Hi
My asm code optimisation is bound to be worse than the compiler. But if "C"'s optimisations where to be criticised where do they lie in? Plus is there any documentation on the topic.
#177104 - Dwedit - Sun Dec 11, 2011 5:28 pm
You might want to clarify your post a bit, it's hard to read.
Most of the lack of optimization in C comes from following the ARM Procedure Call Standard (APCS) too rigidly. Register use by function calls is limited. Functions must taken in 4 arguments in r0,r1,r2,r3, and must return one 32-bit value into r0, otherwise it needs to use the stack to store results of a function call. (It's also possible to cheat and get two 32-bit values returned in r0 and r1).
Some restrictions of the APCS don't even have any positive effect on the GBA or NDS, like forcing the stack pointer to be aligned to an 8-byte boundary.
Inlining functions lets the compiler do a better job of calling them, because it allows the compiler to decide if it has enough registers available to not need the stack.
When you write your own functions in ASM, you can determine how the registers are used. You can decide which registers are input, output, should be unchanged, and which should be trashed.
The compiler also misses some specific optimizations:
* Division is slow when dividing by a variable (change this to the reciprocal to speed up the program, so it can use the multiply instruction instead of calling a slow division function), dividing by a constant is okay, and dividing by a constant power of 2 becomes a shift.
* It can't re-order registers for LDMIA/STMIA.
* Long Multiplication is REALLY bad in THUMB mode, use ARM mode instead.
* Short functions that do nothing except call another function with a specific argument are not changed to a mov \ branch instruction.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#177105 - blessingta@hotmail.co.uk - Sun Dec 11, 2011 8:51 pm
I just meant to ask about the areas in which "C" implements assembly code less efficiently. I'm trying to discuss this in my report
#177112 - Miked0801 - Mon Dec 12, 2011 7:51 pm
I've found that C does a good job overall - good enough that you rarely have to worry about it. Where it does worse though is:
* Quite often is pushes and pops vars it doesn't need to.
* C doesn't usually have direct access to certain opcodes like 'clz', though intrinsics per vendor address this.
* C doesn't have a rotate left/right command without intrinsics.
* C has no concept nor any way at getting to the CPU flags such as zero and carry. There are a few distinct cases where you can use these flags for small scale optimization.
* C will generally have more frame/stack overhead than straight assembler. Asm code can make assumptions about register use and jumps that the compiler cannot. This actually is one of the larger areas of gain to be had by hand-coding stuff. Also makes it harder to debug and maintain so beware.
* C code doesn't necessarily copy large structures in an efficient manner. You have to be very careful to get optimal performance when copying one structure to another. memcpy() is not usually the most optimal way of copying data and the compiler will rarely use all registers available to it for copying with the block copy commands. That said, if the compiler isn't using those registers, it's worth taking a very close look at the CPU's documentation to make sure that the compiler writers didn't do it on purpose.
Programmers who write compilers are very, very bright individuals. You want to be really sure your "optimization" is better than their code. I've been burned a few times by writing assembler that I thought was better, only to find out the compiler outperformed me because it understood the architecture better than I.