17. BIOS Calls


Apart from hardware interrupts, like HBlank and cartridge interrupts, there are also things called software interrupts, also known as BIOS calls. The software interrupts work very much like ordinary functions: you set-up input, call the routine, and get some output back. The difference lies in how you reach the code; for normal functions you just, well, jump to the routine you want. Software interrupts use the swi instruction, which diverts the program flow to somewhere in BIOS, carries out the requested algorithm and then restores the normal flow of your program. This is similar to what hardware interrupts do, only now you raise the interrupt programmatically. Hence: software interrupt.

The GBA BIOS has 42 software interrupts, with basic routines for copying, math (division, square root), affine transformations for sprites and backgrounds, decompression among others. There are also some very special functions like the IntrWait routines, which can stop the CPU until a hardware interrupt occurs. The VBlank variant is highly recommended, which is what makes this chapter important.

Using software interrupts isn’t too hard if it weren’t for one thing: the swi instruction itself. This again requires some assembly. However, not much assembly, and it’s easy to write C wrappers for them, which we’ll also cover here.

The BIOS functions

Calling the BIOS functions can be done via the ‘swi n’ instruction, where n is the BIOS call you want to use. Mind you, the exact numbers you need to use depends on whether your code is in ARM or Thumb state. In Thumb the argument is simply the n itself, but in ARM you need to use n<<16. Just like normal functions, the BIOS calls can have input and output. The first four registers (r0-r3) are used for this purpose; though the exact purpose and the number of registers differ for each call.

Here’s a list containing the names of each BIOS call. I am not going to say what each of them does since other sites have done that already and it seems pointless to copy their stuff verbatim. For full descriptions go to GBATEK, for example. I will give a description of a few of them so you can get a taste of how they work.

Full list

id Name id Name
0x00 SoftReset 0x08 Sqrt
0x01 RegisterRamReset0x09 ArcTan
0x02 Halt 0x0A ArcTan2
0x03 Stop 0x0B CPUSet
0x04 IntrWait 0x0C CPUFastSet
0x05 VBlankIntrWait 0x0D BiosChecksum
0x06 Div 0x0E BgAffineSet
0x07 DivArm 0x0F ObjAffineSet
0x10 BitUnPack 0x18 Diff16bitUnFilter
0x11 LZ77UnCompWRAM 0x19 SoundBiasChange
0x12 LZ77UnCompVRAM 0x1A SoundDriverInit
0x13 HuffUnComp 0x1B SoundDriverMode
0x14 RLUnCompWRAM 0x1C SoundDriverMain
0x15 RLUnCompVRAM 0x1D SoundDriverVSync
0x16 Diff8bitUnFilterWRAM 0x1E SoundChannelClear
0x17 Diff8bitUnFilterVRAM 0x1F MIDIKey2Freq
0x20 MusicPlayerOpen 0x28 SoundDriverVSyncOff
0x21 MusicPlayerStart 0x29 SoundDriverVSyncOn
0x22 MusicPlayerStop 0x2A GetJumpList
0x23 MusicPlayerContinue
0x24 MusicPlayerFadeOut
0x25 MultiBoot
0x26 HardReset
0x27 CustomHalt

Div, Sqrt, Arctan2 and ObjAffineSet descriptions

0x06: Div

r0: numerator

r1: denominator


r0: numerator / denominator

r1: numerator % denominator

r3: abs(numerator / denominator)

Note: do NOT divide by zero!
0x08: Sqrt

r0: num, a unsigned 32-bit integer


r1: sqrt(num)

0x0a: ArcTan2

r0: x, a signed 16bit number (s16)

r1: y, a signed 16bit number (s16)


r0: x≥0 : θ= arctan(y/x) ∨ x<0 : θ= sign(y)*(π − arctan(|y/x|).

This does the full inverse of y = x*tan(θ). The problem with the tangent is that the domain is a semi-circle, as is the range of arc tangent. To get the full circle range, both x and y values are required for not only their quotient, but their signs as well. The mathematical range of θ is [−π, π⟩, which corresponds to [−0x8000, 0x8000⟩ (or [0, 2π⟩ and [0, 0xFFFF] if you like)
0x0f: ObjAffineSet

r0: source address

r1: destination address

r2: number of calculations

r3: Offset of P matrix elements (2 for bgs, 8 for objects)

The source address points to an array of AFF_SRC structs (also known as ObjAffineSource, which is a bit misleading since you can use them for backgrounds as well). The AFF_SRC struct consist of two scales sx , sy and an angle α, which again uses the range [0, 0xFFFF] for 2π. The resulting P:

(17.1)     P = [ s x cos α s x sin α s y sin α s y cos α ]

By now you should know what this does: it scales horizontally by 1/sx, vertically by 1/sy followed by a counter-clockwise rotation by α. ObjAffineSet() does almost exactly what obj_aff_rotscale() and bg_aff_rotscale() do, except that ObjAffineSet() can also set multiple matrices at once.

The source data is kept in ObjAffineSource (i.e., AFF_SRC) structs. Now, as the routine sets affine matrices, you might think that the destinations are either OBJ_AFFINE or ObjAffineDest structs. However, you’d be wrong. Well, partially anyway. The problem is that the destination always points to a pa-element, which is not necessarily the first element in struct. You will make the mistake of simply supplying an OBJ_AFFINE pointer when you try to use it to fill those. Don’t say I didn’t warn you.

Two other things need to be said here as well. First, once again we have a bit of a misnomer: ObjAffineSet doesn’t really have much to do with objects per se, but can be used in that capacity by setting r3 to 8 instead of 2. The second is that the routine can also be used to set up multiple arrays via r2. However, be careful when you do this with devkitPro 19. ObjAffineSet() expects its source structs to be word-aligned, which they won’t be unless you add the alignment attributes yourself.

// Source struct. Note the alignment!
typedef struct AFF_SRC
    s16 sx, sy;
    u16 alpha;
} ALIGN4 AFF_SRC, ObjAffineSource;

// Dst struct for background matrices
typedef struct Aff_DST
    s16 pa, pb;
    s16 pc, pd;
} ALIGN4 AFF_DST, ObjAffineDest;

// Dst struct for objects. Note that r1 should be
// the address of pa, not the start of the struct
typedef struct OBJ_AFFINE
    u16 fill0[3];    s16 pa;
    u16 fill1[3];    s16 pb;
    u16 fill2[3];    s16 pc;
    u16 fill3[3];    s16 pd;

Using BIOS calls

Assembly for BIOS calls

You might think this whole discussion was rather pointless since you can’t access the registers and the swi instruction unless you use assembly, which will be pretty tough, right? Well, no, yes and no. The necessary assembly steps for BIOS calls are actually rather simple, and are given below.

@ In tonc_bios.s

@ at top of your file
    .text           @ aka .section .text
    .code 16        @ aka .thumb

@ for each swi (like division, for example)
    .align 2        @ aka .balign 4
    .global Div
    swi     0x06
    bx      lr

This is assembly code for the GNU assembler (GAS); for Goldroad or ARM STD the syntax is likely to be slightly different. The first thing you need to do is give some directives, which tells some details about the following code. In this case, we use the ‘.text’ to put the code in the text section (ROM or EWRAM for multiboot). We also say that the code is Thumb code by using ‘.code 16’ or ‘.thumb’. If you place these at the top of the file, they’ll hold for the rest of the thing. For each BIOS call, you’ll need the following 6 items.

  • Word-alignment. Or at least halfword alignment, but words are probably preferable. There are two directives for this, .align n and .balign m. The former aligns to 2 n so requires ‘.align 2’; the latter aligns to m so you can just use ‘balign m’. Note that both will only work on the next piece of code or data and no further, which is why it’s best to add it for each function.
  • Scope. The .global name directive makes a symbol out of name, which will then be visible for other files in the project as well. A bit like extern or, rather, an anti-static.
  • Thumb indicator It would seem that .code 16 alone isn’t enough, you also need .thumb_func. In fact, if I read the manual correctly this one also implies .code 16, which would make that directive redundant.
  • Label. ‘name:’ marks where the symbol name starts. Obviously, to use a function it must actually exist.
  • BIOS call To actually activate the BIOS call, use ‘swi n’, with n the BIOS call you want.
  • Return And we’re practically done already, all we have to do now is return to the caller with ‘bx lr’.

See? It’s really not that complicated. Sometimes you might want a little more functionality than this, but for the most part you only need two measly instructions.

The Arm Architecture Procedure Call Standard

That’s all fine and good, but that still leaves the questions of a) how do I combine this with C code and b) where’d all the input and output go? The answer to the first is simple: just add a function declaration like usual:

// In tonc_bios.h

int Div(int num, int denom);

Mkay, but that still doesn’t explain where my input and output went. Well actually … it does.

“I am not sure how clouds get formed. But the clouds know how to do it, and that is the important thing”

Found that quote long ago in one of those Kids on Science lists, and I’m always reminded of it when programming. The thing about computers is that they don’t think in terms of input, output, text, pictures etc. Actually, they don’t think at all, but that’s another story. All a computer sees is data; not even code and data, just data since code is data too. Of course, you may not see it that way because you’re used to C or VB or whatever, but when all is said and done, it’s all just ones and zeros. If the ones and zeros come to the CPU via the program counter (PC register, r15) it’s code, otherwise it’s data.

So how does that explain the input/output? Well, it doesn’t do it directly, but it points to how you should be looking at the situation. Consider you’re the compiler and you have to convert some bloke’s C code into machine code (or assembly, which is almost the same thing) that a CPU can actually use. You come across the line “q= Div(x,y);”. What does Div() do? Well, if there’s no symbol in the C-file for that name (and there isn’t, as it’s in tonc_bios.s), you wouldn’t know. Technically, you don’t even know what it is. But Div knows, and that’s the important thing. At least, that’s almost how it works. The compiler should still need to know what sort of critter Div is to avoid confusion: A variable? A macro? A function? That’s what the declarations are for. And the declaration above says that Div is a function that expects two signed integers and returns one too. As far as the compiler’s concerned, it ends there.

Of course, that still doesn’t explain how the compiler knows what do to. Well, it simply follows the ARM Architecture Procedure Call Standard, AAPCS for short. This states how functions should pass arguments to each other. This PDF document can be found here and if you’re contemplating assembly is a very worthwhile download.

For now, here’s what you need to know. The first four arguments are placed in the first four registers r0-r3, every one after that is placed on the stack. The output value is placed in r0. As long as you take the argument list of the BIOS call as the list in the declaration, it should work fine. Note that the declaration also takes care of any casting that needs to be done. It is important that you realize just what the declaration means here: it determines how the function is called, not the actual definition assembly function. Or even C function. Things can go very wrong if you mess up the declaration.

Another thing the AAPCS tells you is that register r0-r3 (and r12) are so-called scratch registers. This means that the caller expects the called function to mess these up. After the function returns their contents are to be considered undefined – unless you’re the one writing both asm functions, in which case there may be certain … allowances. Having these as scratch registers means that a function can use them without needing to push and pop the originals on and off the stack, thus saving time. This does not hold for the other registers, though: r4-r11, r13, r14 must be returned in the way the calling function got them. The last one, r15, is exempt from this, as you shouldn’t screw around with the program counter.

Inline assembly

Actually, you don’t even need a full assembly file for BIOS calls: you could use inline assembly. With inline assembly, you can mix C code and assembly code. Since the functions are usually rather simple, you could use something like

// In a C file
int Div(int num, int denom)
{   asm("swi 0x06");   }

This does exactly the same thing as the assembly version of Div. However, you need to be careful with inline assembly because you can’t see the code around it and might accidentally clobber some registers that you shouldn’t be messing with, thus ruining the rest of the code. For the full rules on inline assembly, see the GCC manual. You can also find a short faq on inline assembly use at devrs.com. The ‘proper’ syntax of inline assembly isn’t the friendliest in the world, mind you, and there are other problems as well. Consider the C function given above. Since it doesn’t really do anything itself, the optimiser may be tempted to throw it away. This will happen with -O3 unless you take appropriate precautions. Also, the compiler will complain that the function doesn’t return anything, even though it should. It has a point, of course, considering that part is taken care of inside the assembly block. There are probably a few other problems that I’m not aware of at present; in the end it’s easier to use the full-assembly versions so you know what’s happening.

The swi_call macro

On the other hand, there are also BIOS calls that use no arguments, which can be run via a mere macro. The swi_call(x) macro will run the BIOS call x, and can be found in swi.h, and in Wintermute’s libgba, which is where I got it from. It’s a little more refined than the Div function given above. First, it uses the volatile keyword, which should keep your optimizer from deleting the function (just like we did for all the registers). Secondly, it uses a clobber list (after the triple colons). This will tell the compiler which registers are used by the inline assembly. Thirdly, it will take care of the Thumb/ARM switch automatically. If you use the -mthumb compiler option, the compiler will define __thumb__ for us, which we will now use to get the right swi-number. Clever, eh?

#define swi_call(x)   asm volatile("swi\t"#x ::: "r0", "r1", "r2", "r3")
#define swi_call(x)   asm volatile("swi\t"#x"<<16" ::: "r0", "r1", "r2", "r3")

By the way, if you want more information about assembly, you can find a number of tutorials on ARM assembly at gbadev.org. Another nice way to learn is by using the -S compiler flag, which will give you a compiler-generated assembly file of your code. This will show you exactly what the compiler does to your code, including optimisation steps and use of the AAPCS. Really, you should see this at least once.

It may also help to use -fverbose-asm, which will write out the original variable names and operations in comments. Usually in the right place too. Also handy is the ASM_CMT() macro shown below. This will give you some hints as to where specific blocks of code are. But again, not all the time.

#define ASM_CMT(str) asm volatile("@ " str)

//In code. Outputs "@ Hi, I'm here!" in the generated asm
ASM_CMT("Hi, I'm here!");

Demo graphs

math graphs

Fig 17.1: div, sqrt, arctan2, sin and cos graphs, courtesy of BIOS.

To illustrate the use of BIOS calls I am using Div, Sqrt, ArcTan and ObjAffineSet to create graphs if a hyperbole, square root, sine and cosine. I’ve scaled them in such a way so that they fit nicely on the 240x160 screen. The definitions are

division y= 2560/x
square root y= 160*sqrt(x/240)
arctan y= 80 + 64*(2/π)*(arctan(x-120)/16))
sine y= 1*sy*sin(2π·x/240) ; sy= 80
cosine y= 80*sx*cos(2π·x/240) ; sx= 1

and these functions have been plotted in fig 1. If you’re wondering how I got the sine and cosine values, as there are no calls for those, take a look at eq 1 again. The P matrix has them. I’ve used pa for the cosine and pc for the sine. Note that the graphs appear instantly; there is no sense of loading time while the graphs are plotted. An earlier version of the mode 7 demo (or PERNs mode 7 demo) used calls to the actual division, sine and cosine functions to build up the LUTs. Even with the symmetry rules of trigonometry, sin() and cos() are still noticeably slower than the BIOS equivalent.

#include <stdio.h>
#include <tonc.h>

// === swi calls ======================================================

// Their assembly equivalents can be found in tonc_bios.s

void VBlankIntrWait()
{   swi_call(0x05); }

int Div(int num, int denom)
{   swi_call(0x06); }

u32 Sqrt(u32 num)
{   swi_call(0x08); }

s16 ArcTan2(s16 x, s16 y)
{   swi_call(0x0a); }

void ObjAffineSet(const AFF_SRC *src, void *dst, int num, int offset)
{   swi_call(0x0f); }

// === swi demos ======================================================

// NOTE!
// To be consistent with general mathematical graphs, the 
// y-axis has to be reversed and the origin moved to the 
// either the bottom or mid of the screen via
// "iy = H - y"
// or
// "iy = H/2 - y"
// functions have been scaled to fit the graphs on the 240x160 screen

// y= 2560/x
void div_demo()
    int ix, y;

    for(ix=1; ix<SCREEN_WIDTH; ix++)
        y= Div(0x0a000000, ix)>>16;
        if(y <= SCREEN_HEIGHT)
            m3_plot(ix, SCREEN_HEIGHT - y, CLR_RED);
    tte_printf("#{P:168,132;ci:%d}div", CLR_RED);

// y= 160*sqrt(x/240)
void sqrt_demo()
    int ix, y;
    for(ix=0; ix<SCREEN_WIDTH; ix++)
        y= Sqrt(Div(320*ix, 3));
        m3_plot(ix, SCREEN_HEIGHT - y, CLR_LIME);
    tte_printf("#{P:160,8;ci:%d}sqrt", CLR_LIME);

// y = 80 + tan((x-120)/16) * (64)*2/pi
void arctan2_demo()
    int ix, y;
    int ww= SCREEN_WIDTH/2, hh= SCREEN_HEIGHT/2;
    for(ix=0; ix < SCREEN_WIDTH; ix++)
        y= ArcTan2(0x10, ix-ww);
        m3_plot(ix, hh - y/256, CLR_MAG);
    tte_printf("#{P:144,40;ci:%d}atan", CLR_MAG);

// wX= 1, wY= 80
// cc= 80*sx*cos(2*pi*alpha/240)
// ss=  1*sy*sin(2*pi*alpha/240)
void aff_demo()
    int ix, ss, cc;
    ObjAffineSource af_src= {0x0100, 0x5000, 0};    // sx=1, sy=80, alpha=0
    ObjAffineDest af_dest= {0x0100, 0, 0, 0x0100};  // =I (redundant)

    for(ix=0; ix<SCREEN_WIDTH; ix++)
        ObjAffineSet(&af_src, &af_dest, 1, BG_AFF_OFS);
        cc= 80*af_dest.pa>>8; 
        ss= af_dest.pc>>8;
        m3_plot(ix, 80 - cc, CLR_YELLOW);
        m3_plot(ix, 80 - ss, CLR_CYAN);
        // 0x010000/0xf0 = 0x0111.111...
        af_src.alpha += 0x0111;

    tte_printf("#{P:48,38;ci:%d}cos", CLR_YELLOW);
    tte_printf("#{P:72,20;ci:%d}sin", CLR_CYAN);

// === main ===========================================================

int main()



    return 0;

Vsyncing part II, VBlankIntrWait

Until now, all demos used the function vid_vsync to synchronize the action to the VBlank (see the graphics introduction). What this did was to check REG_VCOUNT and stay in a while loop until the next VBlank came along. While it works, it’s really a pretty poor way of doing things for two reasons. First, because of the potential problem when you are in a VBlank already, but that one had been covered. The second reason is more important: while you’re in the while loop, you’re wasting an awful lot of CPU cycles, all of which slurp battery power.


Fig 17.2: swi_vsync demo.

There are a number of BIOS calls that can put the CPU into a low power mode, thus sparing the batteries. The main BIOS call for this is Halt (#2), but what we’re currently interested in is VBlankIntrWait (#5). This will set things up to wait until the next VBlank interrupt. To use it, you have to have interrupts switched on, of course, in particular the VBlank interrupt. As usual, the VBlank isr will have to acknowledge the interrupt by writing to REG_IF. But it also has to write to its BIOS equivalent, REG_IFBIOS. This little bit of information is a little hard to find elsewhere (in part because few tutorials cover BIOS calls); for more info, see GBATEK, BIOS Halt Functions. Fortunately for us, the switchboard presented in the interrupts section has this built in.

To show you how to set it up, see the swi_vsync demo. The most important code is given below; a screen shot can be found in fig 2. What it does is give a rotating metroid sprite with an angular velocity of π rad/s (this corresponds to Δθ = 0x10000/4/60= 0x0111). The basic steps for interrupt handling should be familiar, except the fact that there’s no real VBlank isr because the switchboard already takes care of acknowledging the interrupt. After that it’s pretty simple: we use ObjAffineSet() to calculate the required affine matrix and VBlankIntrWait puts the CPU on Halt until the next VBlank interrupt.

// inside main, after basic initialisations

AFF_SRC as= { 0x0100, 0x0100, 0 };

// enable isr switchboard and VBlank interrupt
irq_add(II_VBLANK, NULL);


    // Full circle = 10000h
    // 10000h/4/60= 111h -> 1/4 rev/s = 1 passing corner/s
    as.alpha += 0x0111;
    ObjAffineSet(&as, &oaff.pa, 1, 8);

    obj_aff_copy(obj_aff_mem, &oaff, 1);

Prefer VBlankIntrWait() over vid_vsync()

Waiting for the VBlank via vid_vsync() (or its functional equivalent) is not a good idea: it wastes too much battery power. The recommended procedure is using VBlankIntrWait() to halt the processor, to be woken again on the VBlank interrupt.

Acknowledging IntrWait routines

VBlankIntrWait() is only one of the BIOS’s IntrWait() routines that can stop the CPU until an interrupt has been raised. However, it doesn’t look at REG_IF but at REG_IFBIOS (0300:7FF8) for the acknowledgement of the interrupt. If your game locks up after trying VBlankIntrWait(), this may be why. Note that you may find the address under other names, as there isn’t really an official one for it.

Final thoughts

Now that you know how to use them, I should warn you that you shouldn’t go overboard with them. It appears that the BIOS routines have been designed for space, not speed, so they aren’t the fastest in the world. Not only that, there’s an overhead of at least 60 cycles for each one (mind you, normal functions seem to have a 30 cycle overhead). If speed is what you’re after then the BIOS calls may not be the best thing; you can probably find faster routines on the web … somewhere. This doesn’t mean that the BIOS routines can’t be useful, of course, but if you have alternative methods, use those instead. Just remember that that’s an optimisation step, which you shouldn’t do prematurely.