#139625 - dushan42 - Sat Sep 08, 2007 10:27 am
Hi,
I've noticed that the code generated for my member function pointer calls is quite hairy. It seems that gcc fails to optimise calls even when it knows that the function's class doesn't use multiple / virtual inheritance (at least not for Thumb).
For example:
In this case you'd expect the call to be pretty complicated as the compiler doesn't know anything about SomeClass. Both gcc and Visual Studio understandably generate equally horrid code. However if I change the first line to:
I'd expect the compiler to get smart about the call as it knows multiple inheritance definitely isn't involved. Visual Studio does - the resulting code is exactly what I want:
Unfortunately the Thumb code generated by gcc is identical in both cases:
Ouch!
I've tried this with a bunch of different versions of gcc (4.1.1, 4.1.0, 4.0.1, 3.4.4) and none of them generated optimised code when SomeClass was defined at the time of the call.
Any ideas why gcc does this? Is there a way to get more optimal code out of it through magic command line args or massaging the code?
Thanks,
Dushan
I've noticed that the code generated for my member function pointer calls is quite hairy. It seems that gcc fails to optimise calls even when it knows that the function's class doesn't use multiple / virtual inheritance (at least not for Thumb).
For example:
Code: |
class SomeClass;
typedef void (SomeClass::*MemberFunc)(); void callMemberFunc(SomeClass *object, MemberFunc func) { (object->*func)(); } |
In this case you'd expect the call to be pretty complicated as the compiler doesn't know anything about SomeClass. Both gcc and Visual Studio understandably generate equally horrid code. However if I change the first line to:
Code: |
class SomeClass {}; |
I'd expect the compiler to get smart about the call as it knows multiple inheritance definitely isn't involved. Visual Studio does - the resulting code is exactly what I want:
Code: |
(object->*func)();
00424D40 mov ecx,dword ptr [esp+4] 00424D44 jmp dword ptr [esp+8] |
Unfortunately the Thumb code generated by gcc is identical in both cases:
Code: |
00000000 <_Z14callMemberFuncP9SomeClassMS_FvvE>:
0: b510 push {r4, lr} 2: 1c04 mov r4, r0 (add r4, r0, #0) 4: 07d3 lsl r3, r2, #31 6: d407 bmi 18 <_Z14callMemberFuncP9SomeClassMS_FvvE+0x18> 8: 1c0b mov r3, r1 (add r3, r1, #0) a: 1050 asr r0, r2, #1 c: 1820 add r0, r4, r0 e: fffef7ff bl 0 <_call_via_r3> 12: bc10 pop {r4} 14: bc01 pop {r0} 16: 4700 bx r0 18: 1050 asr r0, r2, #1 1a: 5903 ldr r3, [r0, r4] 1c: 185b add r3, r3, r1 1e: 681b ldr r3, [r3, #0] 20: e7f4 b c <_call_via_r3+0xc> 22: 46c0 nop (mov r8, r8) |
Ouch!
I've tried this with a bunch of different versions of gcc (4.1.1, 4.1.0, 4.0.1, 3.4.4) and none of them generated optimised code when SomeClass was defined at the time of the call.
Any ideas why gcc does this? Is there a way to get more optimal code out of it through magic command line args or massaging the code?
Thanks,
Dushan