#171616 - Rajveer - Sat Dec 05, 2009 3:52 am
Yeah I remember reading that divides/multiplications of a power of 2 get modified into a bit shift by the compiler, somewhere on these forums. Not sure though.
#171618 - Dwedit - Sat Dec 05, 2009 4:25 am
Division by a constant is optimized, division by a variable is not.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#171628 - sajiimori - Sun Dec 06, 2009 1:00 am
Also, even when the compiler optimizes division by a constant power of 2, it's only a simple shift if the left-hand side is an unsigned value. If it's signed, extra opcodes are generated.
#171637 - sajiimori - Mon Dec 07, 2009 9:03 pm
Really interesting doc! But I wouldn't give it to a junior programmer, without discussing each point with them. Almost every statement in the presentation can be very easily misinterpreted or overgeneralized.
Case in point, the caveat I already mentioned (about signed integer strength reduction) is omitted from the PDF. It just declares an unsigned int and says it's fast, but doesn't mention that it's slower for signed values.
It's very possible that the finer points were noted by the speaker, without being on the slides.
#171642 - Exophase - Mon Dec 07, 2009 11:18 pm
GCC isn't nearly as good at ARM as it is on x86. Beating GCC at ARM is easy (IMO anyway). It's been improving a lot over the years but still pretty slowly.
Compilers may know tricks that you don't, and they might be better at scheduling for complex architectures than you are. But there are a lot of things they're just not as good at, generally. And no matter how good a compiler is, it won't have a degree of insight into a problem that the programmer often will. Runtime profiling can help but only so much, and really it's a pain in the ass.
Some optimizations are just easier to write in ASM. Sometimes you can coerce the compiler into generating what you want, but at pretty great difficulty and expense of code clarity or other optimizations. For instance, in order to get GCC to sibcall so I could get tail recursion w/o my stack blowing up (yes, relying on an optimization, but never mind that right now) I had to put the body of the function in another function, non-static so it didn't get inlined. Otherwise GCC would fail to match the stack frames due to the locals, despite the fact that the locals are all dead by the time the sibcall is made. GCC could be smarter than this, but the fact is that it isn't. This was for x86, for what it's worth - I don't usually look at ASM generated on x86 but sometimes I have to. On ARM I look at it all the time.
I think my main beef with this article is precisely that it does seem to assume all coding is done on cutting edge out of order deeply superscalar deeply pipelined complex CPUs, when all sorts of coding is done on simpler embedded CPUs and it's these CPUs that NEED greater detail to optimization. The ARM9 on the DS, for instance, has relatively simple static scheduling, so the compiler really doesn't have an advantage over an ARM coder worth much of anything. On the other hand, the compiler isn't that amazing at folding memory increments or even optimal register allocation and various platform specific tricks.
I do agree with the article that memory hierarchy is very important, but again, it's exaggerated here for really high MHz platforms that have deep memory hierarchies that quickly hit large cycle count latencies. It matters on DS, especially with only a little bit of L1 cache, but main memory just isn't as relatively slow as it is on modern x86. To say that this kind of optimization is all that matters anymore is just insane. You can still make huge wins shaving off cycles here and there in inner loops, purely in non-memory.
Finally, I think the dogma of "profile before doing anything" is overstated. Here I take profiling to mean using an automated method of finding which pieces of code are hit most often. The process itself can introduce its own new information that skews the results. I'm not referring to benchmarking in general, which is always useful. As general advice to a general audience "always profile" is probably good because I think a lot of people don't have enough intuition to reliably know which parts of their program need optimization, but as far as I'm concerned some things are just going to be dead obvious.
Basically, I think people need to be ready for anything and willing to examine problems on a case by case basis. There has been backlash for a long time towards an (incorrect) attitude that dropping down to ASM will result in better code than what a compiler generates - now what has replaced it is this equally incorrect attitude that the compiler is always perfect and you're wasting your time trying to beat it. But yes, actually checking performance before and after doing anything is a no-brainer.