gbadev.org forum archive

I am still fairly new here, however I have learned that the prefered way of programming something for the GBA is to do so mostly in C. At the same time including inline Assembly seems to be a good idea in "time critical" areas of the program. This last statement is what confuses me and hopefully comeone can clarify it for me.

Lets say that I want to program Tetris Clone Advance. Should I:
1. first start off my programming the entire thing in C, and then reprogram what I can in ARM
2. program only "time critical" areas in ASM -- and I don't know what this means. I am guessing it refers to areas of code that need to be computed as fast and efficiently as possible

Thanks
_________________
http://people.umass.edu/ozahid

Usually, the thing to do is to program everything in C or C++ (if C/C++ is your language of choice) and then, if needed, convert time critical routines to assembly. You might never need to use assembly. This really depends on what you are trying to do.

"Time-critical" implies that there is a limited amount of time within which to get a certain job done, and that either the limit is significant enough or the algorithm is complicated enough that it is unlikely you would have enough time to complete the routine without doing some optimizations.

On the GBA, time-critical routines are often interrupt service routines (ISRs) that must execute in a very brief period of time, such as those that generate special effects on every individual scanline, handle multiplayer communications, or mix sound into the sound buffers.

On the larger, more general scale a time-critical routine is often a very small portion of the code that the processor spends almost all of time in. The 80/20 rule is known as the Pareto Principle, which roughly states that 80% of the problem is due to 20% of the source (or, applied to programming that 80% of the computer's time is spent in 20% of your code). This is the code you want to find and ultimately optimize (perhaps into ASM, although not necessarily).

Dan.

Thinking about this, although it is often said that you might not need to use assembly, how often is this the case? How about in these two scenarios, would well written C code be good enough?

1 - My current project, a platformer with not many special effects being used, at the moment the only thing I want to do with interrupts is create a wavy screen by changing BG offsets each HBlank and maybe some sound later.

2 - An idea I had to do a Rhythm action game like frequency. This would use mode 7 techniques to have a 3d plane that would travel towards you, obviously sound and probably keypress interrupts for accurate timings.
_________________

Code:

CanIKickIt(YES_YOU_CAN);

Heavily optimized (as opposed to well-written) C code might be good enough to achieve the effect you desire. (Heck, you can do Mode 7 without interrupts entirely if it's your only raster effect by using DMA0.)

The problem is that the process of heavily optimizing C code generally means predicting what its assembly equivalent will wind up being, at which point you may as well just be writing assembly.

Another problem is that things aren't always stackable... for example, I might be able to write an interrupt-driven sound mixer/music player in plain C and get it working, but if in my game I have an HBlank effect that needs to steal cycles from it then having a tightly-optimized assembly version may be more useful.

Dan.

poslundc wrote:

Heck, you can do Mode 7 without interrupts entirely if it's your only raster effect by using DMA0.

Interesting, I might have to try and find some examples of that as I assume I could also use it for a rippling background effect?
_________________

Code:

CanIKickIt(YES_YOU_CAN);

Personally, I have found that diving into ASM for the GBA has done nothing but waste my time. I know that it is essential to do sometimes, and I think I have a strong grasp on assembly in general, but I have projects where I have used floats and division (not BIOS or shifts) everywhere with the intention of optimising it later and have never had to.

In general my golden rule is work in C/C++ and if something has issues it can usually be traced back to a design problem rather than a shitty compiler.
_________________
www.hungrydeveloper.com
Version 2.0 now up - guaranteed at least 100% more pleasing!

Yup, very much so. The only time I've managed to slow down the hardware so far was when I had two scrolling backgrounds that updated the entire screen rather than me being arsed to learn how to just update strips at the edges.
_________________

Code:

CanIKickIt(YES_YOU_CAN);

poslundc wrote:

The problem is that the process of heavily optimizing C code generally means predicting what its assembly equivalent will wind up being, at which point you may as well just be writing assembly.

However, the C is still easier to read, and it lets one release bit-equivalent mixers and rasterizers in the GBA version and in a hypothetical downloadable demo of a PC port of a GBA game.

Quote:

Another problem is that things aren't always stackable... for example, I might be able to write an interrupt-driven sound mixer/music player in plain C and get it working, but if in my game I have an HBlank effect that needs to steal cycles from it then having a tightly-optimized assembly version may be more useful.

TOD M3 runs a C based mixer (albeit mono) and HDMA-based mode 7 at the same time. The trick is to do as much as you can in DMA.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

I'll chime in here a bit.

The only places in all our games where we use assembly are:

1. Decompression routines (#2 speed hit in our games, everything in our game is compressed heavily so these routines take a ton of our time - this is what profiling is for.)
2. crt0 as it is called a ton
3. Mixers in sound libraries (#1 speed hit in our games)
4. Our sprite Y Sort routine (Shell Sort - was #3, but converted to assembly made it go ~x20 faster and half the size.)
5. Before we switched to edge fill scrolling, our scroll routines.

And for fun:
4. Our division routine, though if we ran low on IWRAM, this would be the first to revert back to BIOS calls. Needed in Mode7/3D games, otherwise it's kind fluff.

What's not in assembly that is suprising?
1. Our HBlank(VCount)/VBlank interrupts as the compiler handles these small routines fine on its own with special care in coding.
2. Our Serial communications code, because it's so big and complicated, it's hard to justify that much space for it - especially when a bunch of it's time is hardware dependant.
3. Our base actor handler - which probably should be in assembly (#3 speed hit for most games) :)

Why do you use shell sort? Isn't it a bit slow? Why not a O(n log n) implementation? Ok i guess it won't matter much with just a few sprites...
_________________
Team Pokeme
My blog and PM ASM tutorials

Lupin wrote:

Why do you use shell sort? Isn't it a bit slow? Why not a O(n log n) implementation? Ok i guess it won't matter much with just a few sprites...

Right, and at small array sizes, the constant term dominates. Some have reported that the constant term is bigger with heapsort or merge sort than with Shell sort. (Quicksort uses extra RAM, and practical implementations actually fall back to an insertion sort such as a Shell sort for small sub-ranges.) In addition, Shell sort is about O(n log n) on nearly sorted data, and Y-positions of sprites that were sorted in the previous frame are likely to be nearly sorted in this frame.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Couldn't have said it better myself :)

The overhead for shell sort is so small that it beats other sorts for the datasets we use it on. Heck, Bubble sort is (almost) fine for small, nearly sorted data sets. We usually sort between 10 and 70 objects per gameloop so our routine gets in and out and done plenty quick.

Bubble Sort - hehe, N=60 of 1800 or so was our original version and was butt slow.

Insert Sort - very simple, but for higher Ns, N^2/8 adds up quick (60 would be 450ish)

Quick Sort == big stack space or silly complexity on versions that kill tail recursion. For our data sets, it's overkill and since its in and out of memory so much, it wouldn't be an ideal GBA sort anyways as our data is mostly in EWRAM.)

Heap Sort - Nice on all datasets, but a whole lot of setup and exit code on the implementations I've seen.

For N around 60, n log n and n ^ 1.25 come pretty close anyways (107 to 167), but overhead kills the heap sort advantage. My shell sort is around 200 bytes and compares against y pos, sprite priority, secondary priority, and a unique number (the last needed as my sort isn't stable which causes sprites to flicker back and forth if they are exactly the same in comparison values)

Again, I could use a N sort like bucket or radix, but come on! Complexity and RAM overhead makes this as silly as bogo sort :)

gbadev.org forum archive

ASM > GBA C Programming with inline ARM ASM

#29611 - ozahid - Mon Nov 22, 2004 4:16 pm

#29612 - isildur - Mon Nov 22, 2004 4:22 pm

#29615 - poslundc - Mon Nov 22, 2004 5:55 pm

#30773 - identitycrisisuk - Sat Dec 04, 2004 3:41 pm

#30777 - poslundc - Sat Dec 04, 2004 4:49 pm

#30779 - identitycrisisuk - Sat Dec 04, 2004 5:04 pm

#30785 - MumblyJoe - Sat Dec 04, 2004 5:55 pm

#30790 - identitycrisisuk - Sat Dec 04, 2004 6:30 pm

#30810 - tepples - Sun Dec 05, 2004 1:46 am

#31181 - Miked0801 - Wed Dec 08, 2004 7:12 pm

#31208 - Lupin - Wed Dec 08, 2004 9:53 pm

#31221 - tepples - Thu Dec 09, 2004 12:14 am

#31328 - Miked0801 - Fri Dec 10, 2004 3:24 am