gbadev.org forum archive

I would just like to know how you acheive real time and/or consecutive programming (round robin). I have being running through the many programming tutorials around and now have a good idea of how to program the gba but still no idea how to do this.

I have a feeling it is done like this.
Set up a timer that interupts the program every few times a second.
Every time this happens the place where it leaves the program saved in memory and I load in to the program counter place that I want to load up the next process for the GBAs memory(as it was saved the last time that process had a time slice).
The program goes through the list of processes befor starting at the beginning.

this way i can constantly skip between the sprites backgrounds etc of the program. It is all well and good having the theory, I just don't know how to implement it.

Any other ideas of how people have acheived this type of thing feel free to post as i'm not too bothered how i do this more that it works in a flexible way.

You mean like an operating system, where it gives chunks of time to seprate programs, or just splitting up the work of a single game engine over several fames kind of thing? Cause if it can do a full game frame in one HW frame, then there's no need for jumping between processes. Otherwise, I'd try to make it where it does everything that doesn't display immediately on one frame (so you don't see a half-finished frame), and then everything that will display on the VBlank of the second.
And that doesn't involve any process jumping either. Generally there's no need for it in a console game. You either do everything at once, or split it up very organized-ly.

For an operating system-type thing, (I have no experience or knowledge about them, so this is just an idea) I'd have a while(time < timePerProcess) loop for each program's main section, where it processes all the messages it's recieved since last frame, or as many as it can get done before it runs out of time, and then move onto the next program. timePerProcess would probably just be the number of cycles between the start of the frame and VBlank divided by the number of programs running.
You'd need to adjust timePerProcess after each program finishes its thing, since it would probably either finish way early, or go over the time limit if it happened to be doing something time-consuming.
Then have another loop through the programs on VBlank for them to update their display stuff.

That's how I'd do it anyway, but like I said, I've never read a thing about OS programmnig, I just made all that up^_^
I think it would work though.

...or did you mean something else entirely?

first make a procedure that allow you to save the CPU context of a fonction (by context i mean the registers and everything that is needed to start/stop/restart this procedure)

next make a timer routine (under interruption) that wake up every 10ms or something like that

next do a schedule() function that manage a list of function that should be run and which choose the one which we need to give the next cpu slice
(do a simple round robin policy or random policy at first for testing.
round robin is simple to code you have a task list and you choose the i+1 one if i was the last one which was run)

once you have all this working glue everything together
you initialise your task list with the context for each functions
you launch the timer
you launch the first task
when the timer generate an iterruption you save the context for the task that was running you called schedule to find a new one you activate the context for this task, do not forget to re enable the timer and this will do the tricks

An old school way for switching tasks in C is to use setjump()/longjump() and have a separate stack space for each task. A little bit harder is to make sure the tasks access global variables in a nice manner.

pbtmp: sounds good, though the scheduler I'm writing is hard realtime and understands which tasks should be scheduled in vblank or vfill. Context switching on the ARM is a little confusing, though, and if anyone is trying to implement this I strongly recommend looking at the ARM ARM (Architectural Reference Manual) which is available to download from various places online (though not from ARM's site... I think my copy came from Altera). It has sample assembly code for doing a context switch on an ARMv3/4 chip.

Thread synchronisation is another more complex issue entirely; the ARM's swap instruction lets you implement simple boolean mutexes relatively easily, but semaphores and other structures are a little harder. Again, there are examples in the ARM ARM, but I don't like their mutex implementation; it's slow. =)

Running several threads is not likely to be worth the effort on the GBA for game or simple application development; the overheads involved are pretty huge, and it's a pain to allocate per-thread stacks in the tiny amount of waitless memory you have available.

Torne

torne: i know i have a similar hard RT scheduler (not yet 100% finished) in my operating system prototype, but i simplify the approach to have a more understable post. Round robin is a good implementation to start and its is quite easy to update to a more efficient RT policy like RM or EDF or anything else

thx for the ARM manual which i didnt know

=)

That cleared things up a little.

The reason I want to do this is that I want to create a game where all you have to do is have one procedure with all the tests in and loops that are needed to control the sprites backgrounds etc. without he need to split what they do between loops with masses of variables to say which of those states they are in. This way I believe I can create better AI and more compact code because a single sprite is within one procedure.

This may in truth appear not to be the case but all my previous experience with games programming has included this feature and I like to be able to use it again.

Vortex by the way where can you find the functions setjump() and longjump() as they may be sufficient. Otherwise i'm going to try out the other aprouches.

cooky, it sounds like you want to be doing the same thing every frame, so why not just have your vblank bottom half trigger one cycle through each of your functions? That way you don't need a scheduler, or preemption.

T.

Perhaps as a suggestion for handling such things (having experienced such things as doing many things at once).

In C you have single main program loop that has certain events that it branches off for. Then a process loop (seperate) that is called while waiting for interrupts or other events. Interrupts should have very little code in them. For example horizontal timer interrupt should set a flag that's polled in the process loop. If the process loop is too slow then increasing the code in the horizontal interrupt is needed. However only events like serial data (what to do with Joypad data) etc. is actually handled by the process loop. Waiting for a keypress for example or checking collisions or updating sprite positions every vertical interrupt. All easily handled in the process loop.

The program then becomes event oriented and what happens to the display is a result of the process state. It sounds overly simple, but it's not. I've done it so it's not hard.

This is a bit STACK messy though because you are constantly bipping between functions and checking states. In otherwords the stack is in a constant state of being ON stack.

For example playing the game is actually a sub function of the main program (whose function is merely to start a new game continue a game run a demo or show the intro. Each function from there is a sub routine of this. Anyhow I hope this makes sense.

I've used this technique on many projects, very few applications require intense program loops but it's often best to keep things simple and waste as little cpu power as possible. :) This does work I've noticed with this concept.

For C++ any event queue might be good, each object is sent messages such as "you've got the right key pressed" and it should move acordingly. Obviously this is not 'super' efficient but it keeps things clean. And allows you to figure out where to look for problems. I'm using this approach with my current little project.

Cyb

torne: actually one of the points of this is that i don't what to be doing the same thing every frame more that i could be doing any one of a number of things in a frame depening on what tasks i have called in that area this way a going through the functions approach isn't suitable as cyberman surgested(where each function represents a state of the process and the function is decided apon using global variables). Sorry but i will see what I can do with the ideas any way.
_________________
Rolling a six is unlikely but how do you know if you have never picked up the dice.
www.ceorron.co.uk

Yes, but the set of things you could be doing in a frame is presumably static, so enabling/disabling various sets of functions to be called would surely work? Doing boolean tests doesn't take much time. =)

T

torne wrote:

Yes, but the set of things you could be doing in a frame is presumably static, so enabling/disabling various sets of functions to be called would surely work? Doing boolean tests doesn't take much time. =)

T

Your beginning to convice me now but it seems alot of useless tests to set up one area different from the last. Surley wont tha get slow after a while or do you surgest setting up a different loop for each area? (!)
_________________
Rolling a six is unlikely but how do you know if you have never picked up the dice.
www.ceorron.co.uk

You could have more than one loop per area, yes; all you need to do is have them all as seperate functions, and have some variable that can be set which will cause the loop to exit and the right function called next.

Having a huge bunch of tests really isn't a problem, though; if they are all testing the same variable(s) and there are few enough variables that they can stay in registers, it'll only take two or three clock cycles to run each test, and since you have 83766 cycles in each vertical blank interval, it's not going to consume much time. =)

Example of the first:

Code:

void main()
{
while ( true )
{
switch ( state )
{
case 1: state1(); break;
case 2: state2(); break;
case 3: state3(); break;
}
}
}

void state1()
{
while ( state == 1 )
{ do stuff for state 1 }
}

void state2()
{
while ( state == 2 )
{ do stuff for state 2 }
}
...

I'm sure you can see how that would work. For the second:

Code:

if ( state == 1 )
{
do state1 stuff
}
if ( state == 2 || state == 3 )
{
blah blah
}
if ( state == 1 )
{
more state 1 stuff
}

and so on. It doesn't look very 'tidy' but it's fast; skipping over an if statement only takes two cycles if the values tested are already in registers. Compare that to the scheduling overhead of a multithreading system; it takes at least fifty cycles to switch thread, and that's without considering any clever goings-on. =)

T.

[/code]

torne wrote:

...Example of the first:

Code:

void main()
{
while ( true )
{
switch ( state )
{
case 1: state1(); break;
case 2: state2(); break;
case 3: state3(); break;
}
}
}

void state1()
{
while ( state == 1 )
{ do stuff for state 1 }
}

void state2()
{
while ( state == 2 )
{ do stuff for state 2 }
}
...

An even neater way of doing this is to use a function pointer. That way you can save the overhead of the switch satement:

Code:

//declare your state function pointer and initialise to state1

void(*state)() = state1;

//define some state functions

void state1() {

// do stuff for state 1
...
if(foo){
state = state2; //switch state
}
}

void state2() {

//do stuff for state 2
....
if(bar){
state = state1; //switch state
}
}
...

then in your main game loop, you can invoke the current states process by dereferencing the pointer - no conditional checks required!

Code:

....

//process the current state

(*(currentObject->state)) ();

hope this is some help

cheers

Col

The 'overhead' of a switch statement on integers is almost zero, and it saves messing about with function pointers, which if you don't know exactly how they work, are a bit odd sometimes, especially with ARM/Thumb interworking. If the values in the switch are small natural numbers (0,1,2,3 or something) it gets compiled into a jump table which takes exactly the same number of memory accesses as a function pointer to run. =)

T.

torne wrote:

The 'overhead' of a switch statement on integers is almost zero, and it saves messing about with function pointers, which if you don't know exactly how they work, are a bit odd sometimes, especially with ARM/Thumb interworking. If the values in the switch are small natural numbers (0,1,2,3 or something) it gets compiled into a jump table which takes exactly the same number of memory accesses as a function pointer to run. =)

T.

If the switch statement is compiled into a jump table surely there will still be an extra level of indirection:
the jump to the correct case block + the state function call?

Or can gcc optimise out the jump to the case block, and call the state() directly? that would be cool :)

I guess even if the cpu efficiency is the same, I like the neatness/brevity of the pointer approach. So can the switch approach be faster than the pointer approach?

Also, in what way does calling a function normally with interworking differ from dereferencing a function pointer with interworking?

cheers

Col

It uses the same number of memory accesses, which is the real speed factor on the GBA. It'll take one more instruction than if you constructed your own jump table, as gcc doesn't have tree rewrites powerful enough to realise that the switch is made of single function calls. The 'fastest' way is to use a jump table made of pointers, but you'll have to make that yourself, and it won't be much different. =)

Calling a function through a pointer properly in interworking code is awkward because all pointers to Thumb code have to be coerced to have their low bit set to 1, so that the bx instruction will do the right thing. This can add extra code in some cases, depending on the way you call functions and the compiler options that you use. Calling functions directly doesn't have this problem because ARM-target linkers know how to construct shims which switch instruction sets.

Torne

Hi again,

I decided to forget about theory and do some profiling :)

the pointer approach is (only) ~2cycles faster than the switch approach, which isn't a big win, but as the code is also smaller and the source is neater, i'll stick with it for now.

EDIT

this is a load of codswallop!
my stupidity allowed me to happily use visual boy advance for this test - of course i realised immediately after hitting submit that i should be testing on hardware doh!

so heres the real deal

on hardware the pointer approach is ~19 cycles slower than using a switch statement !!!

ouch I'm gonna have a look at the output as soon as i have time :)

cheers

EDIT2

ha after using some unreachable recursion to stop gcc from inlining the dummy functions into the switch statement, a different story transpires...

now on hardware i get pointers as ~12 cycles faster per iteration!
the saga continues!!!

the code is thumb running from rom with interworking on optimisation set to -O3.

(btw gcc seems to gracefully handle the interworking bit in the pointer! if you take the address of a thumb function with interworking set, the lowest bit is a 1!)

heres the test i'm using - i am now less then sure that it is valid - so please rip it up :)

Code:

static volatile u32 val = 0;
static volatile u32 caseIdx = 2;
static void (* volatile funcPtr)() = dummyState1;

void testCaseVsFuncPtr(){

u32 timeTaken = 0;

//test pointer approach
initTimer(); //reset timer
for(u32 i = 0; i < 10000; ++i){
   (*funcPtr)();
}
timeTaken = readTimer();

dprint("\ntime for funcPtr=");
dprint(timeTaken);

//test switch approach
initTimer();
for(u32 i = 0; i < 10000; ++i){
   switch(caseIdx){
   case(0): dummyState4(); break;
   case(1): dummyState2(); break;
   case(2): dummyState1(); break;
   }
}
timeTaken = readTimer();

dprint("\ntime for switch =");
dprint(timeTaken);
}

void dummyState1() {
if(val > 100000) {
   dummyState1();
}
++val;
}
void dummyState2(){ dummyState1(); }
void dummyState3(){ dummyState1(); }
void dummyState4(){ dummyState1(); }

void initTimer() {
REG_TM3DAT = 0;
REG_TM3CNT = 0;
REG_TM2DAT = 0;
REG_TM2CNT = 0;
REG_TM3CNT = 0x84;
REG_TM2CNT = 0x80;
}
u32 readTimer() {
return( (u32)(REG_TM3DAT << 16) | (u32)(REG_TM2DAT) );
}

(the profiling code is borrowed from Kevin Weatherman)

i guess the conclusion (so far) is - if your functions are small, gcc can inline them in the switch statements for a big win, otherwise the pointers are the way to go...

cheers

Col

quick update :)

It seems that there gcc can cope with function pointers and interworking ONLY if the pointers are not to member functions - no great hardship really :)

cheers

Col

gbadev.org forum archive

Coding > Real time/Consecutive programming

#7067 - cooky - Sat Jun 07, 2003 9:04 pm

#7070 - DekuTree64 - Sat Jun 07, 2003 10:57 pm

#7090 - pbmtp - Mon Jun 09, 2003 10:49 am

#7119 - Vortex - Mon Jun 09, 2003 10:44 pm

#7122 - torne - Mon Jun 09, 2003 11:11 pm

#7142 - pbmtp - Tue Jun 10, 2003 3:52 pm

#7170 - cooky - Wed Jun 11, 2003 12:08 pm

#7177 - torne - Wed Jun 11, 2003 3:12 pm

#7202 - Cyberman - Thu Jun 12, 2003 3:34 am

#7333 - cooky - Sun Jun 15, 2003 12:29 pm

#7334 - torne - Sun Jun 15, 2003 1:15 pm

#7335 - cooky - Sun Jun 15, 2003 1:19 pm

#7357 - torne - Mon Jun 16, 2003 11:14 am

#7362 - col - Mon Jun 16, 2003 12:05 pm

#7365 - torne - Mon Jun 16, 2003 1:26 pm

#7372 - col - Mon Jun 16, 2003 2:36 pm

#7380 - torne - Mon Jun 16, 2003 3:52 pm

#7391 - col - Mon Jun 16, 2003 10:17 pm

#7418 - col - Tue Jun 17, 2003 4:39 pm