gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > Slow member function pointers

#139625 - dushan42 - Sat Sep 08, 2007 10:27 am

Hi,

I've noticed that the code generated for my member function pointer calls is quite hairy. It seems that gcc fails to optimise calls even when it knows that the function's class doesn't use multiple / virtual inheritance (at least not for Thumb).

For example:
Code:
class SomeClass;

typedef void (SomeClass::*MemberFunc)();

void callMemberFunc(SomeClass *object, MemberFunc func)
{
   (object->*func)();
}


In this case you'd expect the call to be pretty complicated as the compiler doesn't know anything about SomeClass. Both gcc and Visual Studio understandably generate equally horrid code. However if I change the first line to:

Code:
class SomeClass {};


I'd expect the compiler to get smart about the call as it knows multiple inheritance definitely isn't involved. Visual Studio does - the resulting code is exactly what I want:

Code:
   (object->*func)();
00424D40  mov         ecx,dword ptr [esp+4]
00424D44  jmp         dword ptr [esp+8]


Unfortunately the Thumb code generated by gcc is identical in both cases:

Code:
00000000 <_Z14callMemberFuncP9SomeClassMS_FvvE>:
   0:   b510         push   {r4, lr}
   2:   1c04         mov   r4, r0      (add r4, r0, #0)
   4:   07d3         lsl   r3, r2, #31
   6:   d407         bmi   18 <_Z14callMemberFuncP9SomeClassMS_FvvE+0x18>
   8:   1c0b         mov   r3, r1      (add r3, r1, #0)
   a:   1050         asr   r0, r2, #1
   c:   1820         add   r0, r4, r0
   e:   fffef7ff    bl   0 <_call_via_r3>
  12:   bc10         pop   {r4}
  14:   bc01         pop   {r0}
  16:   4700         bx   r0
  18:   1050         asr   r0, r2, #1
  1a:   5903         ldr   r3, [r0, r4]
  1c:   185b         add   r3, r3, r1
  1e:   681b         ldr   r3, [r3, #0]
  20:   e7f4         b   c <_call_via_r3+0xc>
  22:   46c0         nop         (mov r8, r8)


Ouch!

I've tried this with a bunch of different versions of gcc (4.1.1, 4.1.0, 4.0.1, 3.4.4) and none of them generated optimised code when SomeClass was defined at the time of the call.

Any ideas why gcc does this? Is there a way to get more optimal code out of it through magic command line args or massaging the code?

Thanks,

Dushan

#139649 - col - Sat Sep 08, 2007 4:09 pm

Read this article http://www.codeproject.com/cpp/FastDelegate.asp, it has everything you wanted to know about pointer to member functions. Info about why there are so many issues with them, and details about how different compilers implement them....

It's a good read, highly recommended ! :)

#139652 - keldon - Sat Sep 08, 2007 4:38 pm

Good link; added to the MUL

#139674 - dushan42 - Sat Sep 08, 2007 8:16 pm

The article on FastDelegates is a very good read indeed - thanks for the link.

Unfortunately it doesn't mention any problems with gcc optimization. It's interesting that FastDelegates work without knowing anything about the class at call time:

Code:
#include "FastDelegate.h"

using namespace fastdelegate;

typedef FastDelegate<void(void)> SomeDelegate;

void callDelegate(SomeDelegate &delegate)
{
   delegate();
}


compiled with Visual Studio this yields really efficient code:
Code:
void callDelegate(SomeDelegate &delegate)
{
   delegate();
00424D70  mov         eax,dword ptr [esp+4]
00424D74  mov         ecx,dword ptr [eax]
00424D76  mov         eax,dword ptr [eax+4]
00424D79  jmp         eax 
}


Classes with multiple / virtual inheritance are detected at bind time & stub function is generated for those - it's all very clever :). It does all come down to a MFP call however so with gcc I still end up with inefficient code:

Code:
00000000 <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE>:
   0:   b500         push   {lr}
   2:   6883         ldr   r3, [r0, #8]
   4:   b081         sub   sp, #4
   6:   6801         ldr   r1, [r0, #0]
   8:   6842         ldr   r2, [r0, #4]
   a:   07d8         lsls   r0, r3, #31
   c:   d406         bmi.n   1c <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE+0x1c>
   e:   105b         asrs   r3, r3, #1
  10:   18c8         adds   r0, r1, r3
  12:   f000 f808    bl   26 <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE+0x26>
  16:   b001         add   sp, #4
  18:   bc01         pop   {r0}
  1a:   4700         bx   r0
  1c:   105b         asrs   r3, r3, #1
  1e:   18c8         adds   r0, r1, r3
  20:   6803         ldr   r3, [r0, #0]
  22:   589a         ldr   r2, [r3, r2]
  24:   e7f5         b.n   12 <_Z12callDelegateRN12fastdelegate12FastDelegateIFvvEEE+0x12>
  26:   4710         bx   r2


Interestingly enough, this isn't an ARM/Thumb problem as I assumed - MinGW (gcc version 3.4.2) generates similarly inefficient code for x86. I did a whole bunch of googling for the problem but so far with no luck. I refuse to believe this is a gcc limitation - epecially now that you got me hooked on delegates! :)

#139686 - keldon - Sat Sep 08, 2007 9:33 pm

Just one question, what is it jumping to? Could it be jumping to specific controller code or something?

#139717 - dushan42 - Sun Sep 09, 2007 7:57 am

Do you mean the Visual Studio generated code? I did some tests and it depends on what function assigned to the member function pointer.

If you assign a virtual function, VS generates a tiny stub that looks it up in the vtable and then calls the correct function. The jmp instruction goes into the stub - so in terms of speed it's probably comparable to gcc code.

The beauty of the VS optimisation is that for non-virtual function the jmp is direct - it's as fast as good old fashioned C function pointer.