gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

C/C++ > Mode 0 - help!

#31280 - ImInABand - Thu Dec 09, 2004 3:30 pm

ok, i still have no idea how to start and run a tile mode.

could anyone help me on this? if anyone could tell me how to initialize a tile mode, and using an example tileset and map, show me how to make scrolling bg maps?

if anyone could help me on this, i would be ever grateful.

#31295 - pyros - Thu Dec 09, 2004 6:46 pm

Tonc have sections called:

Video Introduction.
Regular tiled backgrounds.
Regular tiled backgrounds.

as well as plenty more. I have used it to find out how to do rotated sprites and backgrounds and it's very useful indeed.

http://user.chem.tue.nl/jakvijn/tonc/toc.htm

Also see:

http://www.thepernproject.com/

or Sources at http://www.gbadev.org

and use: http://www.work.de/nocash/gbatek.htm

#31866 - ImInABand - Wed Dec 15, 2004 2:11 pm

wow, i did it. i cant believe i actually did it. After about 2 consecutive hours or so of staring at TONC and pern's RPG v3 demo I was able to pool enough knowledge to get mode 0 working. wow!

#31871 - Lord Graga - Wed Dec 15, 2004 4:18 pm

Can I have your babies?

#31929 - ImInABand - Thu Dec 16, 2004 12:55 am

*smacks head*

ok, another dilemma:

I'm trying to initilize sprites and place an unmoving sprite in the canter of the screen, whilst the bg moves around it. i get two outputs of which one of them is where nothing happens, or there is a black 8x8 square at the top left corner of the screen, stays there regardless of BG scrolling.

Code:

   memcpy( (u16 *)0x06010000, &soldierData, sizeof(soldierData) );

   sprites[0].attribute0 = COLOR_256 | TALL | 112;
   sprites[0].attribute1 = SIZE_8 | 72;
   sprites[0].attribute2 = 0;


mode 0, BG0 is enabled, obj and 1d mapping are enabled.

does there seem to be anything amiss?

#31931 - ScottLininger - Thu Dec 16, 2004 1:36 am

Are you using VisualBoy Advance?

If so, use the tile viewer to 1st make sure that your tile data is getting copied into the sprite tile memory. If it is, you'll be able to see the sprite tiles for your soldier sitting right there.

A few common problems:

1. Not copying in your sprite palette, which can give you black or oddly colored sprites. If you have all of your sprite palette entries set to 0x0000 (black), then this could give you the "black square" problem.

2. Not copying from your sprites[] array into OAM. OAM is what actually gets stuff displayed.

3. Having your source graphic generated as a linear bitmap array rather than a "tiled" array. This is hard to explain, but the idea is that tiled gfx are a different order than a "flat" bitmap. Make sure you're using the appropriate gfx converter. (though this would generally give you garbled sprites rather than nothing at all.)

You probably have already checked the above, but it's hard to tell from your source. You may want to post a wider view of your code.

Cheers,

Scott

#31935 - ImInABand - Thu Dec 16, 2004 2:32 am

i used vba to debug some of this, and i look at the sprite tool, and none of the parameters i set for sprite 0 are correct, in fact they are the default values. i don't understand this, because i am going about this the _exact_ way i do when im working in mode 4, since sprites operate the same regardless of mode

#31988 - ScottLininger - Thu Dec 16, 2004 5:42 pm

How are you copying your map tiles into VRAM? Since Sprite tile RAM starts where map VRAM ends, it's possible that you're overwriting your sprite tiles.

But it sounds more like there's a step missing in your copy to OAM step. Are you waiting for Vblank before copying to OAM? You can only copy to OAM during VBlank, so the standard approach is to do this *immediately* once VBlank begins.

-Scott

#32075 - ImInABand - Fri Dec 17, 2004 11:38 pm

its ok, i figured out what was wrong. syntax error that the compiler somehow overlooked [maybe not an ERROR, but the code was missing a few characters to make the code work properly]

everything is fine now, problem solved, and i am in mode 0 just fine now.

=)

quick question though, concerning VRAM writes:

I used to implement the map data using a loop, as such [mode 4 example, but the principle is the same]:


Code:
u16 loop;

for(loop = 0, loop < 256, loop++){
FrontBuffer[loop] = BGdata[loop];
}


I did a similar thing when i initially used mode 0, just took another declared variable, and two for() expressions. i yesterday found that you get the same output when you use the memcpy() command, writing the map data directly into VRAM.

Is there any sort of downfall to using memcpy(), or is it one of the many cases in C where you can do one thing in a completely different way and still get the same output?

[/code]

#32087 - pyros - Sat Dec 18, 2004 2:14 am

not sure about memcpy() but DMA3 is good for stuff like that. well if you need speed it's handy. tonc also covers it. not so useful if you're only say writing a section of a 2D map rather than the full width though, as it is a linear copy.
http://user.chem.tue.nl/jakvijn/tonc/dma.htm

#32406 - Cearn - Wed Dec 22, 2004 12:40 pm

A not-so-little post on memory copies:

--< ABSTRACT >--
Copy methods on the GBA are like standards: there are so many to choose from. DMA, BIOS, loops, memcpy, and then there are still such things as IWRAM vs ROM/EWRAM and ARM vs THUMB code. Although they all do the same thing, where they differ is speed, so here's a list of various ways of copying speeds, as reported by the GBA timers. These tests look at copying data from ROM/EWRAM to VRAM. Compilation was done with -O2 optimization, with interworking on and tests were performed on both hardware and emulator, and under different devkits.

The main results were:
  • With the exception of the u16 index/u16 array loop-copy, there is no difference between dkArm and dkAdv-r5b3
  • Test were run on a regular GBA, GBA-SP (no difference here fortunately), no$gba and vba. ROM tests were done only on no$gba. Do not use vba for profiling. no$ does a very good job, though strangely does not work properly for the BIOS functions.
  • For copies, DMA is fastest (2 cycles/byte), closely followed by CpuFastSet (2.23). TILE-copy (especially ARMed and IWRAMed) deserves a special mention too, as does memcpy.
  • Results for loop-copies depend greatly on where you put them and which instruction set they use. but even in the best case they don't measure up to the earlier mentioned methods. And in the worst case, well, let's not go there.
A zip-file with the full results, ROM/multiboot images and code can be found here

--< TEST CONDITIONS >--
Tests were performed on various methods of data-copying, using the GBA timers. Basically, I set up a cascade timer, the lower one using 1 tick/cycle, run the copy-code, then stop the timers. This will give me a 32bit cycle-count for N bytes copied. Using N=38400 and 8192, these numbers will give a cycles/bytes copied value for each of the test cases.

Profiling macros:
Code:

#define PROF_START() {                     \
   REG_TM2CNT= 0; REG_TM3CNT= 0;       \
   REG_TM3CNT= TM_ON | TM_CASCADE;     \
   REG_TM2CNT= TM_ON;  }

#define PROF_END(__time)                  \
{   REG_TM2CNT= 0; __time=(REG_TM3D<<16) | REG_TM2D;   }


Test cases:
  • 8192 and 38400 bytes to VRAM
  • dkARM vs dkADV 5b3
  • hardware vs no$gba vs VBA
  • EWRAM code/data (i.e. multiboot) vs ROM code/data
  • ARM+EWRAM/ROM vs THUMB+EWRAM/ROM vs ARM+IWRAM
copy modes:
  • DMA:
    - DMA 16: DMA by 16bits
    - DMA 32: DMA by 32bits
  • BIOS
    - CpuSet (32bits)
    - CpuFastSet
  • good ol' memcpy
  • manual looping in C
    - int/u16: u16 array looped by int index
    - int/u32: u32 array looped by int index
    - TILE: TILE array looped by int index
    - u16/u16: u16 array looped by u16 index

A little more info on 'TILE copies': a tile is a struct defined as
Code:

typedef struct { u32 data[8]; } TILE;

and a copy goes like this
Code:

int ii;
TILE dst, src={ blah, blah};
dst= src;

This will copy the entire struct from src to dst. Yes, this works. Very well I might add.

--< RESULTS >--
Units are cycles/byte copied
hw = hardware
x/y,z = instruction set x in section y, data in section z
x = T(humb) or A(rm)
y,z= EW(RAM), IW(RAM) or RO(M)

Platform comparison
Code:

 MultiBoot | dkArm/hw | dkArm/no$ | dkArm/vba |dkA5b3
-----------+----------+-----------+-----------+-------
    DMA 16 |  2.00    |  2.00     |  0.01     |  2.00
    DMA 32 |  2.00    |  2.00     |  0.00     |  2.00
    CpuSet |  4.00    |  0.00     |  0.00     | -
CpuFastSet |  2.23    |  0.00     |  0.00     | -
    memcpy |  4.74    |  4.69     |  4.32     |  4.72
   int/u16 | 14.53    | 14.50     | 11.50     | 14.53
   int/u32 |  8.27    |  8.25     |  6.25     |  8.27
      TILE |  3.44    |  3.41     |  3.22     |  3.45
    u16/16 | 17.58    | 17.51     | 14.51     | 22.01


Instruction-set / section comparison. Note that DMA, BIOS calls and memcpy are out of the users control, and cannot be put in IWRAM or be affected by instruction set.
Code:

mode/sec,sec | T/EW,EW | A/IW.EW | T/RO,RO | AR/RO,RO
-------------+---------+---------+---------+---------
      DMA 16 |  2.00   |   -     |  2.00   |  2.00
      DMA 32 |  2.00   |   -     |  2.00   |  2.00
      CpuSet |  4.00   |   -     |   -     |  -
  CpuFastSet |  2.23   |   -     |   -     |  -
      memcpy |  4.74   |   -     |  5.82   |  5.82
     int/u16 | 14.53   |  6.53   | 17.51   | 29.51
     int/u32 |  8.27   |  4.00   |  9.75   | 14.26
        TILE |  3.44   |  2.48   |  3.85   |  4.63
      u16/16 | 17.58   |  7.78   | 20.50   | 35.51


--< CONCLUSIONS >--
  • On method of profiling
    I did this with the timer registers (see macros given above). The DMA results agree with the CowBite's DMA Transfer ratings and DekuTree's speed tests in this thread, so if I screwed up somewhere at least I'm in good company.
    The macros do take a few cycles to start/stop, but since the copies themselves run well into the 10k-1M range, this should be insignificant.
    I don't get the 'exact' same count after every compile / hardware run. For example, the 38400 byte TILE copy with T/EW,EW had 675046 on the first try, but 674988 on the second. This is insignificant, though.
  • All hail DMA (and CpuFastSet too)
    That DMA is best for copies shouldn't come as too much as a surprise, it's its sole purpose after all. That said, though, CpuFastSet is very close to it, and in the previously mentioned thread DekuTree showed that for memory fills the latter is actually faster. Read GBATek's entry on CpuFastSet carefully before using, though, data alignment could cause trouble. Also, DMA and BIOS have some overhead, so they may not be the best in copies in small doses.
  • memcpy
    The memcpy function is actually pretty fast, usually beating the loop-copies. I guess one reason it's not seen often is because it might copy by bytes, which wouldn't work for VRAM. See this thread for example. Didn't notice anything of the sort for both dkArm, dkAdv-r5b3 or dkAdv-r4. I distinctly remember having similar problems a long time ago, but then I sometimes distinctly remember things that never happened. Maybe it's only a factor with small copies, I don't know.
  • Loop copies
    Suck. Well, mostly. This is actually a very complicated case, as can be seen from the measurements.
    Two major influences are where the code is stored and which instruction-set it uses. EWRAM and ROM are u16 areas, so the 32bit ARM instructions would take 2 fetches, which the 16bit THUMB code would only take one. While ARM code may need less instructions, it can't really compete with the double fetches, as one can see in the last two columns where ROM+THUMB consistently outperforms ROM+ARM by nearly a factor of 2. When you use IWRAM+ARM code, the opposite is true. In fact, you can get you can get pretty close to DMA/BIOS-routines here.
    A third major factor is the chunk-size per loop-iteration. The fact is that compiled code has 6 or more instructions PER ITERATION (yes, yes, I know you only need 3 if you punch in the assembly manually, but we're
    talking about C-code here). That's regardless of type, so simply because you'd need twice as many iterations for a u16-array than a u32-array, it'll take twice as long. And using a u16 loop variable it'll take even longer, because it will include 2 extra shift instructions to cut the register down to 16bits, so an u16-array with a u16 loop variable is about the worst you can do. The GBA is a 32bit machine, remember? use 32bits variables where you can.
    The TILE-loop is another beast entirely, because it's a struct copy. If x and y are structs, then x=y copies the whole struct, that's why you use pointers when passing these to functions. Apparently, the compiler is smart enough to use stmia/ldmia (STore/LoaD Multiple ImmediAte) pairs for this. you will have a few more instructions per loop iteration, but also a lot fewer (less?) iterations, which is why this one's able to compete.
    Lastly, although not presented here, optimization-level also has a great effect here. The default (-O0) is absolutely hideous, especially for u16 arrays. O1 and O2 give the same results and strangely enough O3 is actually a little bit worse than O2.

#32457 - Miked0801 - Thu Dec 23, 2004 1:46 am

You're forgeting method 4 - the one I like to se on smaller copies - ldmia/stmia direct calls. To get the compiler to do this for you, do struct to struct assignments.

typedef struct
{
u32 a, u32 b;
}STRUCT_8_BYTES;

{
STRUCT_8_BYTES a,b;
a = b;
}

or for other data

{
u8 data[16] = {sdjkfjsdkhfjklsd};
u32 dest[16];

*( (STRUCT_8_BYTES *) dest) = *(STRUCT_8_BYTES *)data);
}

This will beat all methods for less than about 48 bytes due to no setup time.

#32475 - crossraleigh - Thu Dec 23, 2004 4:08 am

Mike, isn't your technique the same as Cearn's TILE technique?

#32478 - tepples - Thu Dec 23, 2004 4:40 am

I think the TILE technique uses a struct of 32 bytes, the size of one 16-color tile and half the size of the ARM7 register file:
Code:
struct TILECOPY {
  unsigned int a, b, c, d, e, f, g, h;
} TILECOPY;

CpuFastSet is just the TILE technique in an unrolled loop, possibly with Duff's device.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.