gbadev.org forum archive

I can't remember if I read this or its just something that was made up in my head and completely not true!

When compiling do the divides and times done in your program automatically get changed into bit shifts if possible?

Anyway just asking, cant remember if/where I read it.
_________________
My Homebrew Games

Yeah I remember reading that divides/multiplications of a power of 2 get modified into a bit shift by the compiler, somewhere on these forums. Not sure though.

Ahh ok thanks.

Yea think I read it here too, somewhere.
_________________
My Homebrew Games

Division by a constant is optimized, division by a variable is not.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

Rajveer wrote:

Yeah I remember reading that divides/multiplications of a power of 2 get modified into a bit shift by the compiler, somewhere on these forums. Not sure though.

On ARM, multiplication by 2^n-1 and 2^n+1 should also be changed to add/rsb instructions (IIRC), and on x86 anything that is synthetizable by LEA.

Azenris wrote:

I can't remember if I read this or its just something that was made up in my head and completely not true!

When compiling do the divides and times done in your program automatically get changed into bit shifts if possible?

Anyway just asking, cant remember if/where I read it.

Yes, that's called strength reduction, and with gcc, traditionally it's included in the set of optimizations enabled with -O2.

Also, even when the compiler optimizes division by a constant power of 2, it's only a simple shift if the left-hand side is an unsigned value. If it's signed, extra opcodes are generated.

You might find this article interesting - "state of the art C compiler optimization tricks" http://lambda-the-ultimate.org/node/3674

And here is the PDF on Google Docs.

It focuses on x86, but arm-eabi-gcc gives similar results with a few of the examples when I tried them.

Really interesting doc! But I wouldn't give it to a junior programmer, without discussing each point with them. Almost every statement in the presentation can be very easily misinterpreted or overgeneralized.

Case in point, the caveat I already mentioned (about signed integer strength reduction) is omitted from the PDF. It just declares an unsigned int and says it's fast, but doesn't mention that it's slower for signed values.

It's very possible that the finer points were noted by the speaker, without being on the slides.

sajiimori wrote:

Really interesting doc! But I wouldn't give it to a junior programmer, without discussing each point with them. Almost every statement in the presentation can be very easily misinterpreted or overgeneralized.

Case in point, the caveat I already mentioned (about signed integer strength reduction) is omitted from the PDF. It just declares an unsigned int and says it's fast, but doesn't mention that it's slower for signed values.

It's very possible that the finer points were noted by the speaker, without being on the slides.

While that's certainly true, I think the general message is simple and accurate: Don't micro-optimize unless you've profiled the code, both before and after your optimization, as odds are a) the compiler will do a better job than you as it's surprisingly clever and will likely take advantage of details about the underlying architecture that you aren't even aware of (the Opteron branch prediction workaround is a great example), and b) if you do try to be clever, you might just end up tricking the compiler into generating *less* optimal code.

GCC isn't nearly as good at ARM as it is on x86. Beating GCC at ARM is easy (IMO anyway). It's been improving a lot over the years but still pretty slowly.

Compilers may know tricks that you don't, and they might be better at scheduling for complex architectures than you are. But there are a lot of things they're just not as good at, generally. And no matter how good a compiler is, it won't have a degree of insight into a problem that the programmer often will. Runtime profiling can help but only so much, and really it's a pain in the ass.

Some optimizations are just easier to write in ASM. Sometimes you can coerce the compiler into generating what you want, but at pretty great difficulty and expense of code clarity or other optimizations. For instance, in order to get GCC to sibcall so I could get tail recursion w/o my stack blowing up (yes, relying on an optimization, but never mind that right now) I had to put the body of the function in another function, non-static so it didn't get inlined. Otherwise GCC would fail to match the stack frames due to the locals, despite the fact that the locals are all dead by the time the sibcall is made. GCC could be smarter than this, but the fact is that it isn't. This was for x86, for what it's worth - I don't usually look at ASM generated on x86 but sometimes I have to. On ARM I look at it all the time.

I think my main beef with this article is precisely that it does seem to assume all coding is done on cutting edge out of order deeply superscalar deeply pipelined complex CPUs, when all sorts of coding is done on simpler embedded CPUs and it's these CPUs that NEED greater detail to optimization. The ARM9 on the DS, for instance, has relatively simple static scheduling, so the compiler really doesn't have an advantage over an ARM coder worth much of anything. On the other hand, the compiler isn't that amazing at folding memory increments or even optimal register allocation and various platform specific tricks.

I do agree with the article that memory hierarchy is very important, but again, it's exaggerated here for really high MHz platforms that have deep memory hierarchies that quickly hit large cycle count latencies. It matters on DS, especially with only a little bit of L1 cache, but main memory just isn't as relatively slow as it is on modern x86. To say that this kind of optimization is all that matters anymore is just insane. You can still make huge wins shaving off cycles here and there in inner loops, purely in non-memory.

Finally, I think the dogma of "profile before doing anything" is overstated. Here I take profiling to mean using an automated method of finding which pieces of code are hit most often. The process itself can introduce its own new information that skews the results. I'm not referring to benchmarking in general, which is always useful. As general advice to a general audience "always profile" is probably good because I think a lot of people don't have enough intuition to reliably know which parts of their program need optimization, but as far as I'm concerned some things are just going to be dead obvious.

Basically, I think people need to be ready for anything and willing to examine problems on a case by case basis. There has been backlash for a long time towards an (incorrect) attitude that dropping down to ASM will result in better code than what a compiler generates - now what has replaced it is this equally incorrect attitude that the compiler is always perfect and you're wasting your time trying to beat it. But yes, actually checking performance before and after doing anything is a no-brainer.

Well said. =)

gbadev.org forum archive

DS development > Auto Optimisation

#171615 - Azenris - Sat Dec 05, 2009 2:57 am

#171616 - Rajveer - Sat Dec 05, 2009 3:52 am

#171617 - Azenris - Sat Dec 05, 2009 3:57 am

#171618 - Dwedit - Sat Dec 05, 2009 4:25 am

#171623 - ninjalj - Sat Dec 05, 2009 12:12 pm

#171627 - Abcd1234 - Sat Dec 05, 2009 11:54 pm

#171628 - sajiimori - Sun Dec 06, 2009 1:00 am

#171629 - Quirky - Sun Dec 06, 2009 5:26 pm

#171637 - sajiimori - Mon Dec 07, 2009 9:03 pm

#171638 - Abcd1234 - Mon Dec 07, 2009 9:23 pm

#171642 - Exophase - Mon Dec 07, 2009 11:18 pm

#171644 - sajiimori - Tue Dec 08, 2009 12:47 am