gbadev.org forum archive

Suppose I have an array of bullets. Each bullet structure has a byte that indicates its type. When a bullet "dies" the type by is set to zero to indicate that it's "dead", and the collision, spawn and other routines use this to detect empty slots. But I want to know if it would be better to instead defragment the bullet array every time one dies in a collision or screen exit. This would make the control logic more efficient, but I wonder if the defrag overhead would kill it, because it seems more likely for the earliest bullets to die, thus requiring the largest block copy.

Depends on how many bullets are likely to be around. It probably wouldn't take too long to just skip the defragmenting and search through the array every time. I think the best solution would be a linked list. Just keep an array like you are right now of the maximum possible bullets, but give each bullet a 'next' member that just hold the index of the next bullet in the list. If you limit your max bullets to 255, you only need one byte extra per bullet for the linked list pointer, and you can set it to 255 to mean that it's the last alive bullet in the list. Then keep a 'free' list as well, and whenever you need a bullet, take the current free list head. Let's call it bill, just for clarity (and to make a joke of Mario's Bullet Bill^^). Now set the free list head to bill's 'next' index, set bill's next to the used list head, and set the used list head to bill, and you're done. If a bullet dies, let's name this one fred, because it rhymes with dead, just go through the list until you hit one whose next is set to fred's index, then set that one's next to fred's next, set fred's next to the free list head, and set the free list head to fred.
Is that confusing or what? Linked lists are tricky at first but it's really a neat way of keeping track of things to where you can insert/remove them as you wish, and not have any dead space.
_________________
___________
The best optimization is to do nothing at all.
Therefore a fully optimized program doesn't exist.
-Deku

Limiting your max objects (bullets) and simultaneously using a linked list would be weird, because the primary purpose of a linked list is to avoid arbitrary limitations.

When you're using a fixed-size array, on-the-fly defragmentation would be best when the array tends to have a lot of empty entries. It's quite simple, as long as the order of the objects in the array doesn't matter. When you create a new object, put it in the first unused slot. When you destroy an object, move the last object to the newly freed slot.

If the order of the objects matters, defragmentation will be slow.

sajiimori wrote:

Limiting your max objects (bullets) and simultaneously using a linked list would be weird, because the primary purpose of a linked list is to avoid arbitrary limitations.

Not if you use links which are array indexes, instead of pointers. What I believe Deku was describing was having an array of 255 bullet structs, each one of which had the index of the next valid bullet (not neccecarily next in the array, just the list) as an extra byte field. This is a perfectly reasonable way to use a linked list/array as it saves 3 bytes per entry at the (probably trivial) cost of having a maximum number of entries. No dynamic memory allocation any kind is required to do it this way either, which helps a lot. =)

How does it save 3 bytes per entry?

edit: I guess you're just saying that having a 1-byte reference to the next array index is 3 bytes shorter than having an actual pointer. I still think the method is odd, and the added complexity and memory use doesn't seem necessary.

You're saving time when destroying objects (maybe), but using more time when traversing the array (definitely). Since you traverse more often than you destroy, it just seems strange to me.

Oh well, maybe there's some situation that it's useful and I just can't think of it right now.

sajiimori wrote:

edit: I guess you're just saying that having a 1-byte reference to the next array index is 3 bytes shorter than having an actual pointer. I still think the method is odd, and the added complexity and memory use doesn't seem necessary.

I don't see why it's more complex, and it only uses a fixed amount of memory. Linked lists within arrays are used a lot; it's a very common technique.

Quote:

You're saving time when destroying objects (maybe), but using more time when traversing the array (definitely). Since you traverse more often than you destroy, it just seems strange to me.

It's exactly as fast as using a linked list for all operations, but uses no dynamic memory allocation and makes entries 3 bytes shorter. Where's the problem?

It seems to me the biggest advantage is sparing the dynamic memory allocation, which can be a real bugger on a platform like the GBA.

That is to say, I don't know how effective or efficient the malloc() and free() routines are in DevKitAdvance, but I would probably want to maintain my own heap, which is a lot of extra work and processing to do.

It seems to me that if you know the limits of your gameplay then it makes much more sense to reserve the necessary memory ahead of time than rolling the dice with dynamic memory allocation. It keeps everything far more structured and less fragmented.

You can take the exact same time in traversing the array: just store pointers instead of indices, and it will behave exactly the way a linked list does. Then maintain a separate stack of available pointers in EWRAM that you can simply push/pop pointers onto as you free/allocate them.

Dan.

You don't need pointers; a decent optimiser will generate code that's equivalent using array indexes. =)

One great advantage of using array indices instead of pointers is that you can seperate the list mechanism from the objects you are managing, but with little(or no?) overhead.
This means that if the objects are used in speed critical code, you can keep all the list management stuff in iwram, but keep your(possible large) objects data in ewram.

say you have an array of 20 baddies.

create a seperate array of indices

Code:

Baddie myBaddies[20];
u8 myList[20];
u8 firstFree;

you must do some initialisation to create a complete free list after you have initialised the baddie array:

Code:

void initBaddieList(){
for (int i = 0; i < 19; ++i){
myList[i] = i+1;
}
myList[i] = 0xff; //a guard to check for out of baddies with an assert
firstFree = 0;
}

and you need functions to manage the list

Code:

u32 reserveBaddie(){
assert(firstFree != 0xff);
u32 index = firstFree;
firstFree = myList[firstFree]; //update the list
return index;
}

void releaseBaddie(u32 idx){
myList[idx] = firstFree;
firstFree = idx;
}

what this means if instead of using pointers as object handles, you use indices.

you can use the same indexListArray (in this example - 'myList'), and generalized versions of the management functions to manage your active objects as well.
for each list, you just need to keep the index value of the first object in the list...

this technique is not perfect, its not really a linked list at all, its just an stack (Edit. it's a linked list but you can only insert and remove easily at the front ?), but in the right situation, it can be more efficient than a standard linked list approach, and it can also be used to manage objects that don't have internal list pointers...

I hope this is useful to someone.

cheers

Col

I decided to do the defragmenting method anyway. It was easier to code than I'd figured, and it allowed me to optimize register use in several procedures. It is also not possible to do too much indirection and dereferencing for what I'm doing (arrays of pointers and linked lists are definitely out).

torne et al.,

Somehow I must have given the impression that I thought a dynamically allocated linked list was appropriate for this situation. I don't, so there's no need to argue against using one.

torne wrote:

You don't need pointers; a decent optimiser will generate code that's equivalent using array indexes. =)

I would think pointers would actually be more efficient in the general case, although consuming slightly more memory than just storing indices.

If I'm traversing the list using indices, it means I need to load the next index from memory, shift and add it to the base pointer register, and load the new value. If I'm using pointers, I just load the next pointer and I'm there, without having to keep track of the base in another register. If it's in IWRAM, there is no speed difference to loading a 32-bit word as opposed to 16- or 8-bits.

It should make very little difference either way, though; using indices instead of pointers would only mean one extra register and one extra ALU instruction (two in Thumb). If you're using EWRAM instead of IWRAM, on the other hand, indices will be much more efficient so long as you store them using u8 or u16 data types.

Dan.

Another way to prevent fragmentation would be to allocate your bullet list as a stack. push your max bullet amount onto the stack as empty (0) and pop them off when created - then pushing them back when they die. No fragmentation, and stacks are as easy as an array of items with an index to the top element. Fast and assembly friendly as well.

Mike

Mike,

I don't understand your suggestion. You want to make a stack of all zeros ("push your max bullet amount onto the stack as empty"), then pop zeros off the stack ("pop them off when created") and put them somewhere (where would that be?), then later push the zero back onto the stack ("pushing them back when they die")?

Perhaps you meant to suggest pushing objects onto the stack when they're created, and popping them off when they're destroyed. But that still wouldn't work unless the last object that was created will be the first one that is destroyed (which is obviously not the case for things like bullets).

I'll try again and explain a bit better (take longer to type the message :)

I'm suggesting pushing the max number possible (for your game/app)of "I'm dead now" or if you want "uninitialized" bullet structures onto this stack. When a new bullet is needed, pop a bullet off this stack and initialize it for your game. When you're all finished with the bullet (its dead or whatever), set it back to dead (or not, this is just a practice to help with dangling pointer. If your careful, you technically don't need to set it as dead) and push it back onto the stack. Doing it this way (opposed to putting live bullets in a stack and removing when they die) completely eliminates fragmentation.

We do a similar trick here with in-game actor structures - create an array of the maximum arrays needed, pop off this stack when the actors are created, and push them back on when they die.

poslundc wrote:

torne wrote:

You don't need pointers; a decent optimiser will generate code that's equivalent using array indexes. =)

I would think pointers would actually be more efficient in the general case, although consuming slightly more memory than just storing indices.

If I'm traversing the list using indices, it means I need to load the next index from memory, shift and add it to the base pointer register, and load the new value. If I'm using pointers, I just load the next pointer and I'm there, without having to keep track of the base in another register. If it's in IWRAM, there is no speed difference to loading a 32-bit word as opposed to 16- or 8-bits.

It should make very little difference either way, though; using indices instead of pointers would only mean one extra register and one extra ALU instruction (two in Thumb). If you're using EWRAM instead of IWRAM, on the other hand, indices will be much more efficient so long as you store them using u8 or u16 data types.

Dan.

Loading a value from a array using the index is a single instruction operation in ARM mode. No matter the size of each element in the array.
It is a single instruction operation even if the index is a constant.

Loading a value from a array using a pointer is a two instruction operation when the pointer is a constant, otherwise it is a single cycle operation as well.

Not sure of the number of instructions in thumb mode.

ampz wrote:

Loading a value from a array using the index is a single instruction operation in ARM mode. No matter the size of each element in the array.
It is a single instruction operation even if the index is a constant.

That's the case only if you already know which index you are loading (ie. the index is already stored in a register). We are discussing forming a pseudo-linked-list by including the index into the array for next item as part of the data structure. In which case, you would need to load that index from memory, so it will take at least as many instructions as loading a pointer directly.

Dan.

Miked0801,

Maybe the reason I don't understand you is because you are using the terms 'push' and 'pop' in very unconventional ways. Typically, when you 'push' something, you add information to a stack. When you 'pop' something, you remove information from a stack.

If the stack contains no information, it sounds very strange to say you are 'pushing' things onto it. The stack you describe is either all zero/null, or may as well be all zero/null (because all the objects are considered 'dead'), and thus it contains no information.

By interpreting your explanation more generously, I would guess that by 'popping' off the stack you mean allocating that area on the stack to be used as a live object, and by 'pushing' onto the stack you mean freeing it to be reused. (Incidentally, the 'stack' would then be backwards because all the live data is beyond the top of the stack, and the stack itself contains undefined values in the form of dead objects.) However, to maintain some sort of an analogy to a stack, the object that was allocated last would still have to be freed first.

No, I mean what I say with push (onto stacK) and pop (off of stack). Im just creating a dead list instead of a live one.

An example:

Say there can be a maximum of 20 bullets in the game. 1st step then is to Push all 20 uninitialized bullets onto the stack. This stack is now where you will grab bullets as there are needed in the game.

Now, lets say that a bullet is needed. You Pop a bullet structure off this stack and initialized its contents for its lifetime. Lets say you need another bullet before the 1st one dies. You Pop another structure off the stack (truthfully, your probably only storing pointers as that's much more efficient, but I digress) and set it up.

The stack now currently has only 18 entries in it.

Now lets say that one of the two bullets has died (doesn't matter which.) You take that structure (pointer) and Push it back onto the stack making 19 available.

Another bullet is needed, Pop whatever is available from the stack and set in up for use. Now 18 on the stack.

Both die, push them both back on the stack and 20 are available.

As you can see, no memory fragmentation and only pointer pointer manipulation needed if done right. The only minor catch is that the bullet must keep track of its own pointer while alive so it can properly push itself back into the stack on death.

I stated this above, but for maximum efficiency, you Malloc an array of 20 bullet structures to begin with and only place 20 pointers in the stack. That way you only need play with pointers and malloc memory doesn't fragment with constant alloc/frees.

Hope this makes it clear what I'm saying. It's just a slightly different way of approaching the problem that completely removes fragmentation and is very fast to boot.

Mike

Mike, I don't think that gets anything over using an array. You still have a maximum number of bullets, which is the only problem with using a fixed-size array, and I would've said, from the amount of time it's taken for us to understand what you mean, that your method is more complex than just using a free list. (which is effectively the same as using a stack, just laid out a different way in memory). Fragmentation in terms of the actual memory allocation is zero in both cases; if you were assuming that fragmentation of *active* bullets would be reduced, then you're wrong (unless the most/least recently created bullet is always the first to be destroyed, which is again the same as using an array).

Quote:

Now, lets say that a bullet is needed. You Pop a bullet structure off this stack and initialized its contents for its lifetime.

That implies making a copy of the structure. Since the structure is 'dead' (and thus contains no information), copying it is useless.

Quote:

truthfully, your probably only storing pointers as that's much more efficient, but I digress

No, you don't digress ;-) because popping a pointer off a stack makes sense, while popping 'no information' (a dead struct) off a stack makes no sense.

It then follows that the array is not an array of objects, but rather an array of object references. The objects themselves can be allocated anywhere, and if they are also allocated as an array then they will not be fragmented in memory.

However, that was not the sort of fragmentation the original question was referring to. The original poster assumed that an array would be used, and thus was not concerned about memory fragmentation, but rather fragmentation of live entries in the array of structs.

The reason the poster was concerned was because he/she did not want to iterate over the entire array if many of the entries are dead, because that is less efficient than iterating over the live objects and stopping when a dead object is reached (which can be accomplished by defragmenting the live entries).

A side effect of defragmenting the live entries is that the addresses of the objects can change, so other objects should not store their pointers or indexes. I really should have mentioned that earlier, but I didn't think of it.

Anyway, the objects need to be visited periodically to update their status. If you want to take a shot at answering the original question, you could describe what sort of structure you organize the live objects into for fast access. Of course, it's unlikely that you have a method that's more efficient than array iteration! ;-)

Hehe. Objects and references. I'm such a straight C/Asm and guy that talk like that makes me nervous :)

I'll go re-read the parent again at yoru request and see what I missed...

Ok, if there weren't going to be too many objects in the game, I'd not worry about reading too many 0s. But, if there were going to be lots (say more than about 50), I'd do it as a linked list traversal.

Why are we even worring about this from a performance stand-point anyways (besides the fact that the question was asked)? This sort of code, while called every tic, isn't going to make or break an application unless this is all its doing. The physics/collision/redering code for these objects are going to take by far more CPU time. From an academic view, its fun to debate, but real life, this is probably one of the last things you'd want to optimize seriously.

Mike

Miked0801 wrote:

Why are we even worring about this from a performance stand-point anyways (besides the fact that the question was asked)?

You'd be suprised what can turn out to be a problem. Perhaps the original poster had profiled. =)

I write everything in asm, personally, but that's just because after I'd rewritten large chunks of GCC-generated Thumb code to be much faster, I gave up on its code generation. Much easier (for me, because I'm mad) to just do it in asm to start with (using soon-to-actually-exist tools such as my high-level assembler which will hopefully end up being integrated in the the Eclipse IDE with a semantically-aware editor and support for refactoring).

Assembler plan is at http://whitefang.fitz.cam.ac.uk/proposal/ if you're interested. =)

Lol. Perhaps he did indeed. I'm not yet used to the competance level on this board. I'll have to be a bit more careful with my own comments. :)

On to the assembly comments - please, this is not meant to start the old "Which is better C or Assembly" holy war. We all know all the different answers to that...

While I love programming in Assembler, I find I can get much more done in C and deal with the loss of performance that our no so wonderful C compiler gives us. If ever I need something to go real fast, then I'll go back and hand assemble it, but most of the time, I'm lazy and can deal with our menu subsystem (or whatever non performance critical area) running a bit slow. Besides, my coworkers enjoy reading C code much more that assembler. There's always that "It's assembly, I can't touch this or everything will explode" additude. :)

This coming from a guy who worked on GBC Z80 for years. All hand assembly there, scrapping for every byte/cycle possible. My how my additude has changed when given a somewhat decent compiler.

Mike

Quote:

Objects and references. I'm such a straight C/Asm and guy that talk like that makes me nervous :)

Oh don't get me wrong...I am too! But the word 'variable' is so ambiguous. I mean, what the hell is a 'const variable'?! ;-) By 'object' I mean any block of memory that stores data, so it could mean a primitive value (like char or int or void*), an array, or a struct -- anything that takes up memory.

'Reference', on the other hand, is more general than 'pointer'. A reference can be an index into an array, or a unique ID of some sort, whereas a pointer is strictly a memory address.

Miked0801 wrote:

On to the assembly comments - please, this is not meant to start the old "Which is better C or Assembly" holy war. We all know all the different answers to that...

I wasn't intending to start a holy war; just mentioning it in passing.

Quote:

While I love programming in Assembler, I find I can get much more done in C and deal with the loss of performance that our no so wonderful C compiler gives us.

I generally find it's no great deal to write asm code; I can do it about as fast as C. Perhaps faster once I have my HLA tools. =)

Quote:

Besides, my coworkers enjoy reading C code much more that assembler. There's always that "It's assembly, I can't touch this or everything will explode" additude. :)

Again, HLA allows you to write selfdocumenting assembler, just as I already write selfdocumenting code in other languages. Unit testing is the answer to 'can't touch this or everything will explode', which is not a phenomemon unique to assembly, but rather to any complex code. I'm working on an AsmUnit, but it will need a simulator, probably. Could always write one.

Quote:

My how my additude has changed when given a somewhat decent compiler.

GCC's Thumb code generation is not exactly 'decent', and ARM's own compiler is still not as good as I'd like. I can beat GCC on almost any reasonably long code sequence. =)

Quote:

GCC's Thumb code generation is not exactly 'decent', and ARM's own compiler is still not as good as I'd like. I can beat GCC on almost any reasonably long code sequence. =)

Not a feat to do that :)

But the true question is, can that assembler be written as quickly as the C and is it as easy for other people to work with. I guess if everyone on your project is working in Assembler, it wouldn't be that big of a deal. But here, assembler knowledge is not required so therefore it cannot be maintained by anyone but me or a few other leads.

Ah well, I'm dropping this thread here. Nothing to see, moving along ;)

gbadev.org forum archive

Coding > Bullet Sprites

#12868 - sigma - Sat Nov 29, 2003 1:56 am

#12869 - DekuTree64 - Sat Nov 29, 2003 2:19 am

#12871 - sajiimori - Sat Nov 29, 2003 3:31 am

#12875 - torne - Sat Nov 29, 2003 5:58 am

#12878 - sajiimori - Sat Nov 29, 2003 8:43 am

#12885 - torne - Sat Nov 29, 2003 4:52 pm

#12886 - poslundc - Sat Nov 29, 2003 4:56 pm

#12891 - torne - Sat Nov 29, 2003 7:45 pm

#12892 - col - Sat Nov 29, 2003 8:22 pm

#12896 - sigma - Sun Nov 30, 2003 1:19 am

#12897 - sajiimori - Sun Nov 30, 2003 2:35 am

#12901 - poslundc - Sun Nov 30, 2003 7:01 am

#13019 - Miked0801 - Wed Dec 03, 2003 7:19 pm

#13028 - sajiimori - Wed Dec 03, 2003 11:14 pm

#13035 - Miked0801 - Thu Dec 04, 2003 1:07 am

#13037 - ampz - Thu Dec 04, 2003 2:06 am

#13038 - poslundc - Thu Dec 04, 2003 2:33 am

#13039 - sajiimori - Thu Dec 04, 2003 2:49 am

#13055 - Miked0801 - Thu Dec 04, 2003 6:38 pm

#13063 - torne - Thu Dec 04, 2003 7:10 pm

#13064 - sajiimori - Thu Dec 04, 2003 7:18 pm

#13065 - Miked0801 - Thu Dec 04, 2003 7:37 pm

#13067 - torne - Thu Dec 04, 2003 7:42 pm

#13069 - Miked0801 - Thu Dec 04, 2003 7:52 pm

#13070 - sajiimori - Thu Dec 04, 2003 7:52 pm

#13073 - torne - Thu Dec 04, 2003 8:19 pm

#13075 - Miked0801 - Thu Dec 04, 2003 8:39 pm