gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > I'm trapped in a class!

#36250 - LOst? - Wed Feb 16, 2005 2:23 pm

It all started out in 2001. I was kinda competing with a friend about knowing most C++. however, I only programmed C, and he was way ahead coding C++ things.

Now, 2005, everything I do seem to be a class. For example, I'm coding a plane (background for storing tile numbers) manager, and I feel so dirty making it a class with constructor to set the size, having the size stored as private, deleting the plane buffer with the destructor.

Why do I feel dirty about it? Because last summer I used an old compiler for DOS called WATCOM C (used to create ID Software's DOOM), but I thought it being Open source now called Open Watcom maybe would make it more modern? DOOM was written in C btw.
I made a whole software engine in WATCOM C++, and it was so slow I vomited all over it. I mean I tried every possible optimazion, but I couldn't speed up the things. Then I noticed, moving private variables to public and accessing them directly changed the speed noticable. So why having the variables in private anyway, I thought?
But it didn't stop there... God, writing all these classes for nothing, when I could have the methods as C functions to get the speed the way it was supposed to.

So, I moved the whole software engine project into Visual C++ 6, and added some DirectX for screen access.... With all the classes and everything, and wow the speed was increadible!

But still, I'm hurt by knowing everything that really needs speed were written in C. Such as Microsoft Windows, Visualboy Advance, Gens (Sega EMU).

My software engine is really good, but it can't run at full speed on a 450 MHz, like Gens... And still Gens is an emulator! For gods sake, why did I get into C++ when all it did was getting me in trouble?

I'm stuck. Wherever I need something, I feel I need a constructor, a place where all my code for this object can have its own life, being private for everything else. That's why I can't leave C++. I'm doomed to be slow for the rest of my life I guess.

A typical class by me:
Code:

class Plane
{
private:
   int m_width;
   int m_height;
   int m_size;

   unsigned short* data;

public:
   Plane (int width, int height)
   {
      m_size = (width * height);

      data = new unsigned short [m_size];
      
      m_width = width;
      m_height = height;
   }

   ~Plane ()
   {
      delete [] data;
   }

   int ReturnWidth ()
   {
      return m_width;
   }

   int ReturnHeight ()
   {
      return m_height;
   }

   void SetData (int where, unsigned short entry)
   {
      if (where < 0 || where >= m_size)
         return;

      data [where] = entry;
   }

   unsigned short GetData (int where)
   {
      if (where < 0 || where >= m_size)
         return 0;

      return data [where];
   }
};

Should this be slow? So if it is, what's C++ good for? Writing this in C would require me to do functions and keep track of them like crazy.
And of course I will not have the methods inside the class in the end. This is just when I test the code.

#36258 - sasq - Wed Feb 16, 2005 5:27 pm

If you design carefully you should almost always be able to reach the same speeds with C++ as C.

In the example given I think you should:

Define the Return-functions as inline and const

In SetData(), only check indata if you need to debug:

Code:

inline void SetData (int where, unsigned short entry)
{
#ifdef DEBUG
      if (where < 0 || where >= m_size)
         return;
#endif
      data [where] = entry;
}


This should be as fast as just using a array in C (with a descent compiler at least).

#36259 - poslundc - Wed Feb 16, 2005 5:55 pm

Functions defined within the scope of a class definition are automatically inlined, if they can be.

There's a good chance that your compiler didn't have function-inlining turned on. With GCC, you need to be using at least -O1 for class member functions to be inlined (-O3 for general inline functions).

If the accessor methods are being inlined into your code, they should be just as fast as accessing the member values directly.

(sasq's debug-tip is also a good way to improve performance by removing the bounds checking in your release; an ASSERT function is another good, clean way of handling this.)

Dan.

#36270 - sajiimori - Wed Feb 16, 2005 7:34 pm

Careful with that #ifdef back there. You run the risk of having a perfectly working debug build that becomes a release build with random memory corruption. If it's a failure condition, then assert.

#36271 - Miked0801 - Wed Feb 16, 2005 7:36 pm

Older compilers will also not handle C++ as well. I prefer C, but that's because it was what I grew up in. No matter the language, general optimizing principals apply. 1 or 2 loops will usually eat 80-90% of your overhead and those are where you need to concentrate your efforts. Even consider hand assembling some of the lowest level, highest called routines. As an example, consider the code in a 3D engine which maps textures onto polys (or any other rendering.) This in beginner engines is your biggest bottleneck and should be hand assembled or coded in C (or C++) in such a way that the code emulates the hardware as well as possible. Later on, advanced culling techniques and such spread the load out a bit more (and are also canidates for optimization as well.)

What does matching C code to hardware mean? It means knowing how many registers the CPU has so you know how many local vars it can keep without pushing. It means using 32-bit values on a 32-bit system even if the value may fit into a u16 or u8. It means looking at the assembler output of the C/C++ function to gauge how well the compiler is translating your code and trying different tricks to save instructions (true story here, I had a count down loop that my compiler wouldn't assemble efficiently. I tried for loops, do while, while, etc. The only way that finally worked 100% efficiently was to make it a while(1) loop with an if() break statement in it instead of counting down traditionally.) It means understanding alignment issues such that you can do the following to save space (use with caution!)

Code:

// Normal box code
typedef struct _BOX
{
    u8 tlx, brx, tly, bry;
    u32 flags
} BOX;

BOX box1 = {0, 0, 10, 10, FLAG_ENABLED);
BOX box2 = {5, 5, 12, 12, FLAG_ENABLED);

void collideCheck(void)
{
    if((box1.brx >= box2.tlx) && (box2.brx >= box1.tlx) &&
       (box1.bry >= box1.tly) && (box2.bry >= box1.tly) &&
    {
        if((box1.flags & FLAG_ENABLED) &&
           (box2.flags & FLAG_ENABLED))
        {
            // Collision occured
        }
    }
}

// Designed to compile for ARM (not thumb)
void collideCheckFast(void)
{
    // Read as a s32 the 1st 4 bytes of each struct into 2 registers
    s32 aBox = *(s32 *)(&box1);
    s32 dBox = *(s32 *)(&box2);

    {
        s32 aTop, aBottom, dTop, dBottom;

        // Use shifts to get at values as ARM does this for free
        aTop = aBox >> 24;
        aBox <<= 8;

        aBottom = aBox >> 24;
        aBottom <<= 8;

        dTop = dBox >> 24;
        dBox <<= 8;

        dBottom = dBox >> 24;
        dBottom <<= 8;

       // Above tempvars go out of scope so vars below will still be in register space
        if ( (aBottom >= dTop && dBottom >= aTop))
       {
            s32 aLeft, aRight, dLeft, dRight;

            aLeft = aBox >> 24;
            aBox <<= 8;

            aRight = aBox >> 24;
            // no need for last box shift

            dleft = dBox >> 24;
            dBox <<= 8;

            dRight = dBox >> 24;

            if(aRight >= dLeft && dRight >= aLeft )
            {
                // Check enable flag for both at same time
                if((box1.flags & box2.flags & FLAG_ENABLED)
                {
                     // Collision occured
                }
            }
       }
    }
}



The first example would be compiled (even optimized) as a jumble of 8-bit ROM reads then compares and not very efficient. The 2nd, while nearly 3x as long in source, will run much, much faster as it minimizes ROM accesses and uses the ARMs built it capacity to does shifts for free. The above code may bugs, so use at your own risk (coded on fly), but it shows what knowing the hardware can do for optimizing.

#36319 - sandymac - Thu Feb 17, 2005 7:48 am

I think the following is the "corrected" form of what Miked0801 typed from memory.

Code:

// These cords make more sense to me
BOX box1 = {0, 10, 0, 10, FLAG_ENABLED};
BOX box2 = {5, 12, 5, 12, FLAG_ENABLED};

// Designed to compile for ARM (not thumb)
void collideCheckFast(void)
{
   // Read as a s32 the 1st 4 bytes of each struct into 2 registers
   s32 aBox = *(s32 *)(&box1);
   s32 dBox = *(s32 *)(&box2);
   
   {
      s32 aTop, aBottom, dTop, dBottom;
      
      // Use shifts to get at values as ARM does this for free
      aBottom = aBox >> 24;
      aBox <<= 8;
      
      aTop = aBox >> 24;
      aBox <<= 8;
      
      dBottom = dBox >> 24;
      dBox <<= 8;
      
      dTop = dBox >> 24;
      dBox <<= 8;
      
      // Above tempvars go out of scope so vars below will still be in register space
      if ( (aBottom >= dTop && dBottom >= aTop))
      {
         s32 aLeft, aRight, dLeft, dRight;

         aRight = aBox >> 24;
         aBox <<= 8;

         aLeft = aBox >> 24;
         // no need for last box shift

         dRight = dBox >> 24;
         dBox <<= 8;

         dLeft = dBox >> 24;

         if(aRight >= dLeft && dRight >= aLeft )
         {
            // Check enable flag for both at same time
            if((box1.flags & box2.flags & FLAG_ENABLED))
            {
               // Collision occured
            }
         }
      }
   }
}

Corrections of my correction are welcomed. :-)
_________________
"He who dares not offend cannot be honest." -- Thomas Paine

#36378 - col - Fri Feb 18, 2005 11:13 am

LOst? wrote:

...
I'm stuck. Wherever I need something, I feel I need a constructor, a place where all my code for this object can have its own life, being private for everything else. That's why I can't leave C++. I'm doomed to be slow for the rest of my life I guess....

A typical class by me:
Code:

...
   Plane (int width, int height)
   {
      m_size = (width * height);

      data = new unsigned short [m_size];
      
      m_width = width;
      m_height = height;
   }
...



If you are using constructors in this way wherever you need something, you will potentially be losing a lot of cycles and memory !

The problem is that you are initialising all the members in the constructor body - which means they will be initialised twice !
iirc, this is because all members must be initialised before the constructor body starts, so if you don't specify values in an 'initialiser list', they will be default initialised before the constructor body is run.
To avoid this, you should be using an initialiser list:

Code:

Plane (int width, int height)
:   m_width (width),
    m_height (height),
    m_size (width * height),
    data ( new unsigned short [m_size] )
{
   //ctor body
}

(remember to keep the order of items in the list the same as the order they were declared in.)

cheers

Col

#36393 - sajiimori - Fri Feb 18, 2005 7:46 pm

Primitives are uninitialized by default, so there is no inefficiency here.

#36400 - col - Sat Feb 19, 2005 12:45 am

sajiimori wrote:
Primitives are uninitialized by default, so there is no inefficiency here.


mmm, nice nit ;)

My intent was to suggest that if this style of constuctor without init list is used "Wherever I need something", then it has the potential to cause efficiency problems....

You're right about the PODs not being default initialized, however, using an initializer list - at least in this case - allows gcc to optimize the code better. I compiled both the above examples using the lates DevkitARM with -O3 and found the version with the initializer list produced smaller, faster ASM output.

cheers

Col

#36402 - sajiimori - Sat Feb 19, 2005 1:12 am

Conclusion: compilers are stupid. ;)

#36413 - LOst? - Sat Feb 19, 2005 5:45 am

sajiimori wrote:
Primitives are uninitialized by default, so there is no inefficiency here.


More info please!

I don't like initializing things twice.

Remember that I always call constructors with new like this:

Code:

Plane* plane_a = new Plane (64, 32);

#38120 - Deanonious - Tue Mar 22, 2005 7:20 am

Yes I have to admit that I a really bad habit of not using Init. Lists in my constructors which is a major no no.

Another major bottleneck in C++ coding that you have to majorly avoid is Run Time Type Definition, it's a fairly cool feature of C++, but it kills run-time performance. I can't think of many others at the top of my head, but I did see a book at Barnes & Noble a little while back (I can't remember the title atm) but it was all about C++ Optimizations for time critical applications (particularly games) it looked like a very useful read and reference to keep around. Let me know if you would like me to try to look it up for you.

Dean

#38121 - sajiimori - Tue Mar 22, 2005 8:00 am

That would be Run Time Type Identification, RTTI. C++ doesn't have any faculties for defining new types at runtime.

I haven't seen an appropriate use for RTTI yet. I guess it could be useful for debugging.

Sorry LOst, I didn't see that post from way-back-when. Still, I don't know what more information you want. You declared some primitives in your class, and they are not automatically initialized, so you are not initalizing them twice.

In any case, it's way too early to think about optimization here.

#38131 - Deanonious - Tue Mar 22, 2005 6:15 pm

That's right I remembered RTTI, I just couldn't remember what the I stood for *slaps himself*

I have only used it a few times myself because some of the CS professors at my University think that it is important, the way they had us use it was cool in a way, but for the most part it was just the "Lazy" way of doing it.

Dean

#38148 - Abscissa - Wed Mar 23, 2005 12:35 am

I recommend the book "C++ for Game Programmers". It has tons of info about all sorts of speed issues with C++. Although I perfer D myself.

#38152 - Dwedit - Wed Mar 23, 2005 2:05 am

Isn't division or muliplication by powers of 2 automatically converted to shifts by compilers?
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."

#38153 - jma - Wed Mar 23, 2005 2:43 am

Dwedit wrote:
Isn't division or muliplication by powers of 2 automatically converted to shifts by compilers?


This can be a common misconception. The standard answer is "yes", but there are factors to consider.

o Not all compilers do this (and they don't have to).
o Are you dealing with signed or unsigned values?

For the second point, consider the following:

Code:
int x = -1;

int a = x / 2;
int b = x >> 1;

assert(a != b);


a and b are two totally different values. Did you want the compiler to actually turn the divide operation into a shift in this case?

Processors have methods of getting around this. For example, on the x86, a would be calculated like so:

Code:
cdq
sub eax,edx
sar eax,1


This is (simply put) a little sign extend trick that could get simulated on the ARM like so:

Code:
mov   r1,#0
movs  r0,r0 ; test sign bit
mvnmi r1,r1 ; simulate sign extend
sub   r0,r0,r1
mov   r0,r0 asr #1


The same effect, but still more work than a simple shift operation, with a conditional involved, which is much slower than just using the shift operator. So... use shift when you can, and don't just expect a compiler to write good code for you.

Jeff M.
_________________
massung@gmail.com
http://www.retrobyte.org