gbadev.org forum archive

Hi all. I've just put together a tile engine that dynamically loads tiles from ROM. The idea is that a map can reference up to 65536 unique tiles. Clearly, that's way too many, but 1008 is way too little.

Before I proceed any further with this discussion, here is a demo of my tile engine.

As is stands, in the worst case scenario, a vblank causes 52 tiles to be loaded from ROM into VRAM all at once. So far I haven't been able to do this within a single vblank. The "electron gun" catches up with the tile loading somewhere within the first 8 or so lines on the screen (yes I know, there is no electron gun in an LCD display). This is hardly noticable way up there at the top. When I change the code so that the hardware scrolling executes after the tile loading, I notice (during worst case scenario) that the screen is tearing almost halfway down. That should give an indication of how long it takes.

So here I am, with a pretty-much working engine, wanting to make a platform game. My question is this: in your opinion (and based on your experience), is there enough processing time remaining to implement a platform game, which includes another layer for paralax (without dynamic tile loading), sound & music, and half-decent gameplay?

Another question: is there a way of spreading my tile loading time so that I never load anywhere near 52 tiles in a frame?

Cheers.
- Steve

P.S. I'm testing on actual hardware.
P.S.S. Worst case scenario occurs when scrolling diagonally and both x and y are divisible by 8 (i.e. no remainder).

EDIT: I think I've sort of figured out a way around the worst-case scenario. But I can't be bothered figuring it out now because I need to go to bed. Basically, it's a memory-speed trade-off. But perhaps the trade-off is just that i lose 64 bytes of VRAM - that means I can have 319 (instead of 320) unique tiles to share among the other layer(s). The speed gain is the possibility of (almost) completely flattening the tile loading across frames.

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 9:00 pm; edited 2 times in total

52 tiles @ 32 bytes/tile = 1664 bytes to transfer during VBlank

DMA takes approx 6 cycles per 4 bytes to transfer from ROM to VRAM (Cowbite).

Total transfer time should be around 2496 cycles.

This is just slightly over two scanlines, and VBlank lasts for 68 scanlines.

In other words, you should have enough time in VBlank to transfer about 30 times the number of tiles you're currently transferring in VBlank.

What's the problem?

Dan.

poslundc wrote:

Total transfer time should be around 2496 cycles.

I know I don't know what algo your using here so this might not be a valid point but, aren't you over simplifying? True, it would take that many cycles to transfer that many tiles (regardless if you split them up if my understanding is correct). But, if we are dynamically load and replacing tiles, we need to figure out what tiles are no longer needed, and then load the tiles that we need from the ROM. Save for this example, which is a picture, long strings of freedup slots in VRAM are not likely anymore than the need tiles from the ROM are likely to fall in order. So, true you'll have a total DMA time of 2496 cycles but what about the over head of finding tiles and initializing each transfer? I suppose that if you had a fast algo for determine what tiles can be throw out there would probobly be plenty of time left during the vblank for normal vblank activities but I'm just saying wouldn't the overhead be considerably more than merely two scanlines?

Steve++, you must be one hell of an artist if you need more than 1000 tiles. I know that I would never beable to use that many. ;)
_________________
"head straight for your goal by any means
there is a door that you've never opened
there is a window with a view you've never seen
get there no matter how long it takes"

-Theme of Shadow, Sonic Adventure 2

DarkPhantom wrote:

I know I don't know what algo your using here so this might not be a valid point but, aren't you over simplifying? True, it would take that many cycles to transfer that many tiles (regardless if you split them up if my understanding is correct). But, if we are dynamically load and replacing tiles, we need to figure out what tiles are no longer needed, and then load the tiles that we need from the ROM. Save for this example, which is a picture, long strings of freedup slots in VRAM are not likely anymore than the need tiles from the ROM are likely to fall in order. So, true you'll have a total DMA time of 2496 cycles but what about the over head of finding tiles and initializing each transfer? I suppose that if you had a fast algo for determine what tiles can be throw out there would probobly be plenty of time left during the vblank for normal vblank activities but I'm just saying wouldn't the overhead be considerably more than merely two scanlines?

The algorithm for selecting tiles is changeable and improvable. What can't be changed is the time required to transfer a volume of data from one area of memory into another. Even an unoptimized algorithm should only present minor overhead compared to the actual transfer time; in situations like this it is almost always the volume of data being transferred that becomes the issue, especially when dealing with slow memory access times for regions like ROM and EWRAM.

Besides, the point was he has more than enough time to transfer the tiles he wants into VRAM. Even if he's taking five times as long selecting which tiles he actually wants to transfer, he's still got plenty of time. But he shouldn't be taking that long; he only needs to loop over a row and column of an array.

And yes, Steve++ should probably rethink his design as well, as there is very little reason he should have to load tiles in dynamically like that. The most practical reason to reload tiles during VBlank is in order to have them animate; it's in order to avoid reloading tiles that we have sets and maps in the first place.

Dan.

Why load tiles in VBlank? Why not in normal time? I've got an engine dynamically -uncompressing- chars and loading them on the fly outside of interrrupts - and without visual glitching. Sure, it's a bit tricky, but the results are just beautiful. We can literally get 2 completely unique layers on screen with 2 layers of support and not take too much of a CPU hit to do it. It can be done :)

poslundc wrote:

And yes, Steve++ should probably rethink his design as well, as there is very little reason he should have to load tiles in dynamically like that. The most practical reason to reload tiles during VBlank is in order to have them animate; it's in order to avoid reloading tiles that we have sets and maps in the first place.

How would you alter the design if somebody wanted a single overworld map so large that it had more than 1,024 unique tiles?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

My dinamic tile engine supports 4 layers all of which are dynamic and draw from a pool of arbitrary size. There is no speed issue even if every tile needs to be replaced. The trick is simply a trade off with memory. All you need is a very large inverse look up table that stores the ussage for each tile and its location on the screen. I have demo code available on my site for this.
[url]www.thepernproject.com[/url] -> demos -> nexus

The engine design is from boofly and all c++. By useing a couple of stacks and this large look up table overhead for determining which tiles need to be written is reduced dramaticaly. It ends up takeing 2 bytes for every unique tile in the tile set...so say you have a tile set with 4000 unique tiles you will loose 8KB of ewram which is a small trade off in my opinion. It has been over a year since i worked on this code and much of it may be out dated. I recoment only looking at the background.cpp and .h as well as the setup code in the main.c . Some if it is a bit "hacky" and for that I appolagize.
I hope this was of some help.
_________________
www.drunkencoders.com

tepples wrote:

How would you alter the design if somebody wanted a single overworld map so large that it had more than 1,024 unique tiles?

Has there ever been an RPG on the SNES/GBA that fits that description? ;)

If there has, surely a case like that is the exception and not the rule.

Hypothetically, though, if I were to code such a system I would try to at least break my map up into non-overlapping regions, so I would have my generic tileset plus specific tilesets for areas that required more characteristic features. I would then swap in tilesets as I needed them. Presumably it is reasonable to keep the special-feature areas from infringing upon each other for a distance at least the size of the GBA screen.

Or just DMA tiles in all the time, if you prefer. It seems that a far more reasonable use for this technique, though, would be when you don't have an actual "tile map" per se but instead a very large bitmap that you want to traverse.

Dan.

poslundc wrote:

tepples wrote:

How would you alter the design if somebody wanted a single overworld map so large that it had more than 1,024 unique tiles?

Has there ever been an RPG on the SNES/GBA that fits that description? ;)

I haven't played many RPGs, but at least Jurassic Park, an action game for Super NES, had a large (approx. 512x512 tiles) continuously scrolling overworld map. In fact, the engine loaded programs and data into the sound CPU to change tracks for different map areas while the player was playing.
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

Neat. Well, then, there clearly are cases where it's worth doing. I still think the majority of tile-based games don't require anything like that, though.

Dan.

Quote:

It ends up takeing 2 bytes for every unique tile in the tile set...so say you have a tile set with 4000 unique tiles you will loose 8KB of ewram which is a small trade off in my opinion

Ouchy.

You can set it up so that you only need a use count for all tiles currently loaded into VRAM. A little more RAM of overhead and that's it. We could scroll around on a 64K tile map right now if we really wanted - a 2048x2048 pixel completely unique char map though takes a hella lot of ROM space though (something like 2.2Mbytes!) :)

Thank you all for your thoughts on this. I've reposted the demo, complete with source. Please excuse HAM macroes, Hungarian notation, weird indenting (thanks to bugs introduced in Visual Ham 2.5), unused code that's just lying around, etc... I didn't intend to post the source initially, but I thought some of you may want to figure out just what the hell I'm doing. Please note that this is just my first stab at dynamically loading tiles.

Quote:

52 tiles @ 32 bytes/tile = 1664 bytes to transfer during VBlank

DMA takes approx 6 cycles per 4 bytes to transfer from ROM to VRAM (Cowbite).

Total transfer time should be around 2496 cycles.

Actually, i'm using 8-bit mode, so my tiles are 64 bytes each. So instead of being able to do 30 times the amount of transfers, it's 15 times. But yeah, I get your point. Of course, I need some cycles to actually setup the transfer. Probably around 9 to 12. So now, that's 5 or so times I can transfer that data. Then there's the issue of setting 651 elements in the tilemap (I'm only using a smallest possible portion of a 32x32 tilemap so I can squeeze 11 more tiles out of VRAM for the other background(s)).

My thoughts on this are that if I implement an intelligent algorithm that loads tiles only when not in VRAM, I'm going to add extra processing and memory overheads, yet my worst-case-scenario for tile loading will pretty much be the same (maybe less 2 tiles). So I may be worse off doing this.

Quote:

Why load tiles in VBlank? Why not in normal time? I've got an engine dynamically -uncompressing- chars and loading them on the fly outside of interrrupts - and without visual glitching. Sure, it's a bit tricky, but the results are just beautiful. We can literally get 2 completely unique layers on screen with 2 layers of support and not take too much of a CPU hit to do it. It can be done :)

Thanks for you ideas and words of motivation. Obviously if you were willing to share the code you would have by now, so I won't ask that question. Instead I'll look deeper into the problem.

Quote:

My dinamic tile engine supports 4 layers all of which are dynamic and draw from a pool of arbitrary size. There is no speed issue even if every tile needs to be replaced. The trick is simply a trade off with memory. All you need is a very large inverse look up table that stores the ussage for each tile and its location on the screen. I have demo code available on my site for this.

Four layers, all dynamic? How do you do that? The worst-case scenario is that you need 2,604 tiles loaded into VRAM at once. How do you get around this? By the way, I couldn't find your source for this demo.

Quote:

Neat. Well, then, there clearly are cases where it's worth doing. I still think the majority of tile-based games don't require anything like that, though.

Yeah, I was kind of hoping to rise above the majority.

Quote:

Ouchy.

You can set it up so that you only need a use count for all tiles currently loaded into VRAM. A little more RAM of overhead and that's it. We could scroll around on a 64K tile map right now if we really wanted - a 2048x2048 pixel completely unique char map though takes a hella lot of ROM space though (something like 2.2Mbytes!) :)

I was thinking that two bytes per tile would be to used to map tilemap entries to tile data indices. Clearly, 651 exceeds the 8-bit range. You're probably talking about something else - a usage counter for implementing a least-recently-used algorithm. Actually, I was thinking of used a linked list for LRU.

Thanks again.
- Steve

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 9:00 pm; edited 1 time in total

Paul, thanks for taking time to look at my ugly source.

I'm using the HAM build of gcc and also Visual HAM as the IDE. I'm even using some macros, but i'm not going anywhere near that library. Thanks for pointing out the inefficiency of my code.

I've been thinking a lot about this (sleepless nights). I was thinking of using a loading system whereby tiles are loaded only when needed (like virtual memory). Then i realised the worst-case processing is pretty bad anyway. Today I thought about using hblanks as triggers to load a couple of rows' tiles at a time. Then there would easily be enough vram for all four layers. In fact, I could use the full 32x32 tilemap for each layer (instead of the inefficient crap my current effort does) and avoid a lot of costly processing.

You're definintely right about seperating tile-loading from map-loading. It was weighing on me ever since i coded that garbage! With the new scheme that I'm brewing, I can just load map sections during vblank. Once that's done, I can use the vram tilemap to determine which tiles needed to be loaded, where to load them and when to load them. I can just put this in an array and process it at the appropriate hblanks. For speed, I probably need to cache, in IWRAM, a 32x32 section of each layer's map so I can quickly refer to it during tile loading.

I just hope I have enough cpu left for krawall :)

removed

Last edited by Paul Shirley on Sun Mar 28, 2004 9:01 pm; edited 1 time in total

Quote:

Four layers, all dynamic? How do you do that? The worst-case scenario is that you need 2,604 tiles loaded into VRAM at once. How do you get around this? By the way, I couldn't find your source for this demo.

First, yes all four layers are dynamic and draw from the same tile set. Yes the worst case scenario is more tiles then there is room for but your map must be designed in a way such that there are never more then 700 or so unique tiles on screen at any one time. (I am curently working on a map/world editor that will report this value.) If this is not the case you have probably reached the point of diminishing returns for tile modes and would be better off just useing a bitmap mode. By useing 16 color tiles you can double the onscreen limit. For some reason my site is not responding here. When it is back up just follow the link to nexus under the gba->demo section. There should be the entire project there...if the link is broken i will attempt to fix when i have normal internet access again.

Quote:

You can set it up so that you only need a use count for all tiles currently loaded into VRAM. A little more RAM of overhead and that's it. We could scroll around on a 64K tile map right now if we really wanted

The large array is used to tell me if that tile is allready loaded and where it is loaded...(sudo code)

Code:

//if the inverse LUT is -1 that tile is not in gba vid mem yet
//The inverse LUT is simply a table of s16 that is initialized to -1 and
//has enough entries for each tile in the tile set. If, when you go to draw
//your map, you find that it is still -1 then you know that the graphics are
//not in memory for that tile and need to be copied in. This also requires
//that you keep a stack to store which slots are free in the gba video
//memory. This stack will have an entry that denotes the number of times
//that tile appears on screen.
if(inverseLUT[map[x][y]] > -1)
{
...store inverseLUT value into gba map memory at corect x,y
...increment usage counter for that tile
}
else
{
...find next free slot in the tile set...(this is were the usage counter /stack comes in)
...upload new graphics into that tile slot
...set inverseLUT to point to that slot
...store this value also in the gba map memory
}

this method requires that you also process the tile row and collumb that just left the screen so that you can reduce the tile usage counter for those tiles and free the tile memory if that was the last of its type. If there is a better way to do this I would love to see it. I am by no means an expert in these things but this method has proven fast for me while being strait c++ code and was given to me by someone I do consider an expert in the area...which reminds me [url]www.2headed.com [/url] is his site and he may have example code up for this. There is very little over head when the tiles graphics is allready stored in memory. I truely would like to see other methods of doing this as this is one of the key abilities in any tile engine and this is the only way i know to do it quickly.
_________________
www.drunkencoders.com

We create a hash table (somwehat like your reverse LUT) from the ROM character indexes. This table allows us to get directly to any char loaded in the map 99.99% of the time with a single read (we have something like 1 collision in a 10,000 with our simple hash function of a single and.) So that table (in slow ram) plus a usage per char on loaded into VRAM is how we do it and it is very fast - fast enough that we can decompress chars on the fly and load them into the leading edge without too much issue. We then get away with sharing buffers to allow all 4 planes to use it if they want (sounds like it isn't a secret :) )

We originally had a max char per layer of 651 (31x21), but in order to make the system a bit more flexible, we upped it to 704 (32x22.) Many ways to skin this cat it appears. I'd love to see what kind of performance you guys get.

Quote:

Yes the worst case scenario is more tiles then there is room for but your map must be designed in a way such that there are never more then 700 or so unique tiles on screen at any one time.

I'm trying to get around putting constraints like that on the tilemaps. I hope to get four layers without a limit on the number of unique tiles being displayed at once. That means it is possible for up to 2604 unique tiles to be seen at once.

I've basically got the algorithm figured out in my head. The vram tilemaps won't be changed at all. Their entries will just point to the appropriate areas in vram where tile data will be loaded. I'll maintain a 32x32 software window (iwram) into the real tilemap and that will be used to schedule tile loading. The sceduling will occur during vblank and the loading of tiles (62 at a time) will occur during hblank. There's a 16-line (minus transfer time) grace period involved, so no layer will get in the way of another one, with respect to scheduling. This will probably be irq based.

The benefit of my new scheme is that absolute scrolling is now a possibility, so I don't have to perform a series of relative scrolls to get where I want.

Wow. Copying in tile data following the raster? Talk about Atari 2600-esque. Will you be able to pull it off at full speed?
_________________
-- Where is he?
-- Who?
-- You know, the human.
-- I think he moved to Tilwick.

62 * 32 bytes ~ 2000 bytes * 11 times on screen 22,000+ cycles with DMAs for copying = A hella lot of DMA/interrupt cycles.

Even loading 2 rows at a time means you are going to be spending a ton of time in HBlank interrupts - enough that I believe any Music you are playing is going to glitch badly and play slowly. Also, no way to get multi-player to work with this as there won't ever be enough time to service a serial interrupt successfully. Also, you're only going to be able to service your IWRAM map in VBlank to prevent glitching. Finally, 32x32 x 32 bytes per tile = 32K of IWRAM. That won't work. Next is to store a ROM pointer at each location wich brings it down to 4K, but cuts your transfer speed in at least half. You may need to map the data in such a way that you only need to store 32 x 22 tiles of data in RAM. This still eats all your IWRAM (leaving room for stack and a bare minimum of data.) but at least allows full speed DMAs from IWRAM.

I'm curious on how you intend to overcome these issues :)

Quote:

I'm curious on how you intend to overcome these issues :)

So am I!

gbadev.org forum archive

Coding > Too ambitious?

#17269 - Steve++ - Fri Mar 05, 2004 2:48 pm

#17271 - Paul Shirley - Fri Mar 05, 2004 3:30 pm

#17272 - poslundc - Fri Mar 05, 2004 3:30 pm

#17279 - DarkPhantom - Fri Mar 05, 2004 5:36 pm

#17281 - poslundc - Fri Mar 05, 2004 6:10 pm

#17283 - Miked0801 - Fri Mar 05, 2004 7:10 pm

#17286 - tepples - Fri Mar 05, 2004 9:11 pm

#17289 - dovoto - Fri Mar 05, 2004 9:19 pm

#17293 - poslundc - Fri Mar 05, 2004 9:28 pm

#17294 - tepples - Fri Mar 05, 2004 9:37 pm

#17297 - poslundc - Fri Mar 05, 2004 9:46 pm

#17299 - Miked0801 - Fri Mar 05, 2004 10:00 pm

#17334 - Steve++ - Sat Mar 06, 2004 3:48 pm

#17341 - Paul Shirley - Sat Mar 06, 2004 5:22 pm

#17405 - Steve++ - Sun Mar 07, 2004 2:41 pm

#17414 - Paul Shirley - Sun Mar 07, 2004 4:47 pm

#17416 - dovoto - Sun Mar 07, 2004 5:41 pm

#17423 - Miked0801 - Sun Mar 07, 2004 8:51 pm

#17436 - Steve++ - Mon Mar 08, 2004 4:52 am

#17445 - tepples - Mon Mar 08, 2004 9:38 am

#17457 - Miked0801 - Mon Mar 08, 2004 8:05 pm

#17817 - Steve++ - Mon Mar 15, 2004 8:25 am