gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > Lists

#85271 - linus - Sun May 28, 2006 11:56 am

Hello

Im just making the switch from Java to CPP and I was trying to weigh up the pros and cons of having the linked list built into the data structure as opposed to having a seperate List class.

I want to go with the List class but I dont know how to reference any object, I had a void reference but when it came to using it I couldnt type cast it into a normal reference so I couldnt use the object.

I think it would be better to have a list class because i could write all my List operations in one class (as opposed to having to write the same code for different data structures that are also lists).

Building the list into the data structure is easier and I see it done all over the place in examples but I think the Java in me is just telling me its wrong. What approach should I take?

#85273 - keldon - Sun May 28, 2006 12:55 pm

It all depends on what you are trying to achieve and what your use for it is. An arraylist and a linkedlist have different speeds for each operation so depending on how you are using the list will detirmine which data structure you use.

#85282 - linus - Sun May 28, 2006 5:59 pm

Well basically im going to go for a LinkedList class and i want each LinkedList object to have a reference to another object, how to I acheive this?

I tried having a void reference but i could nt type cast it to anything - how is this done?

#85287 - DekuTree64 - Sun May 28, 2006 6:21 pm

Try a void pointer, instead of a reference. Or a templated class. Templates are probably a bit ugly and confusing for someone new to the language though.

I tend to prefer building the list into the class if it's just a one-directional list and will always be used. 2-directional lists usually end up needing enough code to warrant a seperate class.

If all the objects in the list are in an array, you could write a generic indexed list class, which wouldn't require pointers to the objects at all. Something like
Code:
class IndexedList
{
public:
    enum
    {
        MAX_NODES = 255,
        END_MARKER = 0xff,
    };

    struct Node
    {
        u8 next, prev;
    };

    u8 m_Head;
    Node m_Node[MAX_NODES];
};

Then to run through the list,
Code:
for (int index = list.m_Head; index != IndexedList::END_MARKER; index = list.m_Node[index].next)
{
    object[index].DoSomething();
}

_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

#85297 - sajiimori - Sun May 28, 2006 7:39 pm

Agreed that templates probably aren't good for a beginner, but revisit them later to see how they can improve your linked list code.

#85468 - phantom-inker - Tue May 30, 2006 5:29 pm

linus wrote:
Im just making the switch from Java to CPP and I was trying to weigh up the pros and cons of having the linked list built into the data structure as opposed to having a seperate List class.

Since the above answers didn't really seem to answer the question very well, I thought I'd chime in.

There are pros, and there are cons aplenty in C++. Which you choose depends a lot on what kind of performance you want, and how complicated you like your code to be. Generally, for arrays and lists, I use "roll your own every time," and for more complex data structures like red-black trees, I'll use template classes and template functions, since a larger percentage of the core functionality doesn't change between data types.

Let's look at what you're asking, though, to see how to best compare these two designs. First, the two possibilities:
Code:

Design #1, the "clean" design:

class MyData {
    ...
};

class ListEntry {
    ListEntry *next, *prev;
    void *data;
};

----------------------

Design #2, the "ugly" design:

class MyDataInAList {
    MyDataInAList *next, *prev;
    ...
};

In Java, as with many higher-level languages, the preferred design is #1. In this design, linked-list manipulation can be readily abstracted out as methods of a List container class, or as methods of ListEntry, or as separate functions.

In C, as with many lower-level languages, the preferred design is #2. In this design, separate manipulation functions are used for each type, although often the manipulation operations are implemented directly at the time you need them.

So what's best in C++? It's a tradeoff, one that can only be understood fully by realizing what the memory allocator is doing under the hood. (Memory-allocator issues are less relevant in Java because most JVMs implement a generational copy-collector; you pay for the simplicity with some loss in performance, but you usually get relatively un-fragmented memory with minimal padding. In C++, you have to handle memory allocation yourself, but you have greater flexibility, which can translate into greater speed and even less fragmentation if used correctly.)

The first step is to look at what happens with a "new" or "malloc()" operation in C/C++. In the first design, there are two news (or two mallocs) for every list entry: One for the data, and one for the list-entry with its pointers. The cost of new/malloc is not zero: In general, you can assume it's O(1) amortized time, and O(1) amortized space. It's good that it's O(1) and not O(lg n) or O(n), but O(1) is still not O(0) --- some time and some space is required for every new/malloc. In this case, it's easy to see that with two structures being created, there's twice as much time and twice as much space being used as the single-class version.

So how much time and space per new/malloc, then? A good memory allocator for a 32-bit platform has about 8 bytes of memory waste per new/malloc. And search times are on the order of a couple hundred cycles per new/malloc, on average.

Let's put this in a more practical perspective. Let's say you want to have a linked list where each node stores four 32-bit integers, x, y, type, and state. The cost of this data is 4 bytes per integer = 16 bytes total. Now, add room for two pointers, next and prev = 8 bytes total. Your total "important" data is then 24 bytes. Now, in the second design, we just add in the overhead of a single malloc, and it becomes 32 bytes total --- in short, 25% of your memory will be used as overhead for the allocator. In the first design, however, we have one malloc for the first 16 bytes --- for a total of 24 bytes for the data object --- and one malloc for the linked list --- 8 bytes of next and prev + 4 bytes of object pointer + 8 bytes of overhead = 20 bytes for the linked-list storage. So in the first design, for every node, we end up using 24 bytes + 20 bytes = 44 bytes to store 24 bytes of meaningful information, or 54% of your memory as overhead.

Let's look at allocation times, too: If it's, say, 200 clock cycles per malloc, on average, it's simple enough to show that it takes 400 clock cycles for the "cleaner" design and 200 clock cycles for the "ugly" design. This is to say nothing of the fact that eventually, you'll probably have to delete/free these structures, and that usually comes at the same cost as new/malloc.

These numbers should give you a clue as to why most C programmers will prefer the second model. Yes, you have code duplication in this model, and more chance of errors because you have more code, but it comes at a much lighter cost in memory and time. On a platform like the GBA, where memory and processor are always tight resources, you can't really afford to waste 50% of your memory in overhead and spend double the time to allocate your data.

Or maybe you can; I don't know; it depends on your project.

C++ offers a way around this problem using templates. In effect, templates are like another preprocessing phase, where the compiler invisibly generates design #2 given design #1 and some special keywords to tell it how to do it. Templates are not for neophytes, and even experts can get them wrong. (And even compilers can get them wrong.) And, worse, they can often make your code hard to read if they're not used very carefully. They're worth studying, but only when you feel you're pretty solid in your understanding of C++.

So in general, unless you have a collection type that lends itself well to having a lot of its operations be performed externally (such as a heap-based priority queue or a splay tree), I'd recommend keeping your pointers with your data unless you have a really good reason to separate them out. It may take a lot of extra effort to manage that design, but you'll be well-rewarded for your efforts in both time and memory savings.
_________________
Do you suppose if I put a signature here, anyone would read it? No? I didn't think so either.

#88847 - DrkMatter_ - Wed Jun 21, 2006 9:20 pm

If you opt for separating your data from your list operations, you should (probbably) use the standard std::vector or std::list instead of programming your own.

#88862 - sajiimori - Wed Jun 21, 2006 10:26 pm

Definitely check what STL is doing behind the scenes. Some implementations are optimized for environments with virtual memory.

#88881 - col - Thu Jun 22, 2006 12:52 am

@PhantomInker

I have to disagree with some of the points in your post.

You seem to be suggesting that the main reason the 'clean' design would be chosen is that it allows you to generalize the list management code. I feel that this is a red herring !
The real benefit of this approach is that it allows loose coupling between the container and the containee. The object in the list need know nothing of the list mechanism.
(This is significant because it allows you to keep a list of instances of classes that you may not have access to the implementation source for - significant for distributed systems, large projects with big teams etc.)

Generalizing the list management code is a very useful and powerful technique, and it can be done easily without requiring two mallocs. You just need to include a list node instance as one of the attributes of any class you want to make listable, and provide an accessor method(or have the classes constructor insert the instance into a given list).
You lose the loose coupling, but you save the extra malloc, and get the generalized list management code.
This changes the trade off considerably - the size and speed overhead of this third approach compared with the 'ugly' design is comparatively small, just an extra layer of pointer de-referencing. You would only need to go for the ugly method where the list is processed in a time critical component. (The 'ugly' method can be a maintenance nightmare, so it really should only be used when there is no alternative)

Your explanation of templates also seems misleading: "templates are like another preprocessing phase, where the compiler invisibly generates design #2 given design #1 and some special keywords to tell it how to do it."
Templates cannot generate the 'ugly' design under the hood given the 'clean' design. The whole point of separating the object instance from the list node instance is that the objects design does not require any list logic or references - the 'ugly' design is not possible without breaking this. What templates can do is allow you to generalize your list code without destroying type safety (a big deal in c++)

#89010 - phantom-inker - Thu Jun 22, 2006 7:52 pm

col wrote:
The real benefit of this approach is that it allows loose coupling between the container and the containee. The object in the list need know nothing of the list mechanism.

Yes, but that generalization does come at a cost in many cases. A highly generic list template in C++ is often a bad choice on embedded hardware when performance is tight. On a big system, sure, have at it; in my own code on high-end systems, I abuse the heck out of generic red-black tree templates. But I'd never use the same code on the GBA; it's just too tight a system.

col wrote:
Generalizing the list management code is a very useful and powerful technique, and it can be done easily without requiring two mallocs.

It can be, but if you're looking for a truly generic list like he's used to in Java, the attribute of the list entry is a pointer to the data, not the data itself. I went for explanations a Java programmer could understand, not quotes from the ARM.

col wrote:
You would only need to go for the ugly method where the list is processed in a time critical component. (The 'ugly' method can be a maintenance nightmare, so it really should only be used when there is no alternative)

That I will agree with. But most of the GBA is time-critical; you often don't have clock cycles to burn on that system, and often don't have memory to burn either. My own GBA code doesn't even have a malloc() or free() in it, simply because (having implemented not a few mallocs in my day) I know what the performance costs are there. They can be low, but on the GBA, every clock and every byte counts. I statically allocate everything, and in my code even supposedly "dynamic" lists' nodes are specially-allocated using bitmaps and static buffers. I would never do that on a PC, but on the GBA, you should code like it's the embedded hardware that it is.

col wrote:
Your explanation of templates also seems misleading

Well, yes and no. Remember that he's coming from a Java background where there are no preprocessing phases and nothing even remotely equivalent to templates (although I hear the Java committee is talking about them now). Trying to explain what templates actually do in C++ isn't the easiest of tasks, so I went for an oversimplified but comprehensible explanation.
_________________
Do you suppose if I put a signature here, anyone would read it? No? I didn't think so either.

#89044 - col - Thu Jun 22, 2006 9:34 pm

phantom-inker wrote:
col wrote:
The real benefit of this approach is that it allows loose coupling between the container and the containee. The object in the list need know nothing of the list mechanism.

Yes, but that generalization does come at a cost in many cases...

Of course it does - nobody is denying that
Quote:

col wrote:
Generalizing the list management code is a very useful and powerful technique, and it can be done easily without requiring two mallocs.

It can be, but if you're looking for a truly generic list like he's used to in Java, the attribute of the list entry is a pointer to the data, not the data itself. I went for explanations a Java programmer could understand, not quotes from the ARM.

My point is that you described the two extremes without any hint of the many flavours of compromise in-between that are more suitable for a lot of tasks on GBA or any other system.
Quote:

col wrote:
You would only need to go for the ugly method where the list is processed in a time critical component. (The 'ugly' method can be a maintenance nightmare, so it really should only be used when there is no alternative)

That I will agree with. But most of the GBA is time-critical; you often don't have clock cycles to burn on that system, and often don't have memory to burn either.... but on the GBA, every clock and every byte counts....

If you are sure that every aspect of your application is time and space critical, you should be using assembly.
In most non-trivial applications (even in GBA!), there is a great deal of non time critical code, in these sections, the benefits of easy maintanence usually outweigh the need for efficiency.
Quote:

col wrote:
Your explanation of templates also seems misleading

Well, yes and no. Remember that he's coming from a Java background where there are no preprocessing phases and nothing even remotely equivalent to templates ... Trying to explain what templates actually do in C++ isn't the easiest of tasks, so I went for an oversimplified but comprehensible explanation.

I would say inaccurate therefore confusing :) - maybe it's just a bad idea trying to introduce a topic like templates in a single paragraph ;)

#90358 - sniper - Thu Jun 29, 2006 9:39 pm

cmon why not using templates for lists ? like the STL does. STL is a bit big, but also good ;)

Code:

#include <list>

using namespace std;

int main( int argc, const char **argv )
{
   list<int> myList;
   
   myList.push_back(2);
   myList.push_back(3);
   myList.push_front(1);
   
   for( list<int>::const_iterator i = myList.begin(); i != myList.end(); i++ )
   {
      printf("i = %d\n", *i);
   }
   
   return 0;
}