gbadev.org forum archive

Hi,

I've noticed that the code generated for my member function pointer calls is quite hairy. It seems that gcc fails to optimise calls even when it knows that the function's class doesn't use multiple / virtual inheritance (at least not for Thumb).

For example:

Code:

class SomeClass;

typedef void (SomeClass::*MemberFunc)();

void callMemberFunc(SomeClass *object, MemberFunc func)
{
(object->*func)();
}

In this case you'd expect the call to be pretty complicated as the compiler doesn't know anything about SomeClass. Both gcc and Visual Studio understandably generate equally horrid code. However if I change the first line to:

Code:

class SomeClass {};

I'd expect the compiler to get smart about the call as it knows multiple inheritance definitely isn't involved. Visual Studio does - the resulting code is exactly what I want:

Code:

(object->*func)();
00424D40 mov ecx,dword ptr [esp+4]
00424D44 jmp dword ptr [esp+8]

Unfortunately the Thumb code generated by gcc is identical in both cases:

Code:

00000000 <_Z14callMemberFuncP9SomeClassMS_FvvE>:
0: b510 push {r4, lr}
2: 1c04 mov r4, r0    (add r4, r0, #0)
4: 07d3 lsl r3, r2, #31
6: d407 bmi 18 <_Z14callMemberFuncP9SomeClassMS_FvvE+0x18>
8: 1c0b mov r3, r1    (add r3, r1, #0)
a: 1050 asr r0, r2, #1
c: 1820 add r0, r4, r0
e: fffef7ff bl 0 <_call_via_r3>
12: bc10 pop {r4}
14: bc01 pop {r0}
16: 4700 bx r0
18: 1050 asr r0, r2, #1
1a: 5903 ldr r3, [r0, r4]
1c: 185b add r3, r3, r1
1e: 681b ldr r3, [r3, #0]
20: e7f4 b c <_call_via_r3+0xc>
22: 46c0 nop       (mov r8, r8)

Ouch!

I've tried this with a bunch of different versions of gcc (4.1.1, 4.1.0, 4.0.1, 3.4.4) and none of them generated optimised code when SomeClass was defined at the time of the call.

Any ideas why gcc does this? Is there a way to get more optimal code out of it through magic command line args or massaging the code?

Thanks,

Dushan

Read this article http://www.codeproject.com/cpp/FastDelegate.asp, it has everything you wanted to know about pointer to member functions. Info about why there are so many issues with them, and details about how different compilers implement them....

It's a good read, highly recommended ! :)

Good link; added to the MUL

The article on FastDelegates is a very good read indeed - thanks for the link.

Unfortunately it doesn't mention any problems with gcc optimization. It's interesting that FastDelegates work without knowing anything about the class at call time:

Code:

#include "FastDelegate.h"

using namespace fastdelegate;

typedef FastDelegate<void(void)> SomeDelegate;

void callDelegate(SomeDelegate &delegate)
{
delegate();
}

compiled with Visual Studio this yields really efficient code:

Code:

void callDelegate(SomeDelegate &delegate)
{
delegate();
00424D70 mov eax,dword ptr [esp+4]
00424D74 mov ecx,dword ptr [eax]
00424D76 mov eax,dword ptr [eax+4]
00424D79 jmp eax
}

Classes with multiple / virtual inheritance are detected at bind time & stub function is generated for those - it's all very clever :). It does all come down to a MFP call however so with gcc I still end up with inefficient code:

Code:

00000000 <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE>:
0: b500 push {lr}
2: 6883 ldr r3, [r0, #8]
4: b081 sub sp, #4
6: 6801 ldr r1, [r0, #0]
8: 6842 ldr r2, [r0, #4]
a: 07d8 lsls r0, r3, #31
c: d406 bmi.n 1c <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE+0x1c>
e: 105b asrs r3, r3, #1
10: 18c8 adds r0, r1, r3
12: f000 f808 bl 26 <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE+0x26>
16: b001 add sp, #4
18: bc01 pop {r0}
1a: 4700 bx r0
1c: 105b asrs r3, r3, #1
1e: 18c8 adds r0, r1, r3
20: 6803 ldr r3, [r0, #0]
22: 589a ldr r2, [r3, r2]
24: e7f5 b.n 12 <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE+0x12>
26: 4710 bx r2

Interestingly enough, this isn't an ARM/Thumb problem as I assumed - MinGW (gcc version 3.4.2) generates similarly inefficient code for x86. I did a whole bunch of googling for the problem but so far with no luck. I refuse to believe this is a gcc limitation - epecially now that you got me hooked on delegates! :)

Just one question, what is it jumping to? Could it be jumping to specific controller code or something?

Do you mean the Visual Studio generated code? I did some tests and it depends on what function assigned to the member function pointer.

If you assign a virtual function, VS generates a tiny stub that looks it up in the vtable and then calls the correct function. The jmp instruction goes into the stub - so in terms of speed it's probably comparable to gcc code.

The beauty of the VS optimisation is that for non-virtual function the jmp is direct - it's as fast as good old fashioned C function pointer.

gbadev.org forum archive

C/C++ > Slow member function pointers

#139625 - dushan42 - Sat Sep 08, 2007 10:27 am

#139649 - col - Sat Sep 08, 2007 4:09 pm

#139652 - keldon - Sat Sep 08, 2007 4:38 pm

#139674 - dushan42 - Sat Sep 08, 2007 8:16 pm

#139686 - keldon - Sat Sep 08, 2007 9:33 pm

#139717 - dushan42 - Sun Sep 09, 2007 7:57 am