#149176 - JanoSicek - Wed Jan 16, 2008 11:27 am
Hello
I am writing a scrolling engine on tiled backgrounds for arbitrary sized maps and up to 65535 tiles per background. I am using three layers: BG0, BG1 and BG2.
At the END of each VBLANK I check which tiles I need to load, and put them in a buffer. Then at start of VBLANK I load the tiles in the VRAM and switch to the precomputed new map (double buffering)
The problem is that this process is quite slow for me.
A small scroll of 8 pixels to the right consumes about 60 horizontal scanlines worth of time to preprocess what to load. The loading then consumes about 2 scanlines, so that's fine.
A big scroll (or jump) for 64 pixels, consumes more than 192 scanlines, making the scrolling skip the frame (60fps -> 30 fps)
And an arbitrary jump in the map consumes 300+ scanlines, sometimes skipping two frames (60fps -> 20 fps)
Is my implementation too slow?
It is quite problematic to implement the dynamic tile loading. I am storing all tiles loaded in a hash table (2048 buckets, each is 16bit wide, linear probing)
Total I have 3072 tiles for all three background layers, as they are independent of each other.
I was optimizing this process quite a lot, so I'm quite happy that I can do a single layer scroll in about 20 scanlines, which is about 1200 microseconds. Is this enough for a game? Will I have enough ticks left to do the 'game' itself?
Also, if you have the source of some fast scroller which can work with more than 1024 tiles, can you point me to it?
#149178 - PypeBros - Wed Jan 16, 2008 1:05 pm
Maybe you could try to pre-process the level map so that you can split tiles in "frequent" or "infrequent" and swap banks of "infrequent" tiles as you enter/leave the neighbourhood of a graphic element that requires the new "subset" of tiles.
Perfect example would be e.g. a Beat'm'up game, where you have parts of your tileset that appears everywhere (road, trees, etc.) and other parts that are used together to build a larger image (e.g. a temple), but you don't use temples tiles together with village tiles, even in a single level. That could let you introduce "hints" in your level such as "drop village tiles and replace them by temple tiles when we approach the "temple >" sign...
Is there any reason why you expect scrolling "jumps" of >16 pixels per frame ? Can you live with a fade-in/fade-out transition between two buffers in those cases?
_________________
SEDS: Sprite Edition on DS :: modplayer
#149185 - Dwedit - Wed Jan 16, 2008 1:40 pm
Here's my suggestion.
So we have "virtual" tiles, which is what is specified, and "physical" tiles that actually sit in VRAM.
Use two tables:
* Lookup table from virtual tiles to physical tiles.
* Back table from physical tiles to virtual tiles
* Count table for each physical tile. This keeps track of how many times a physical tile appears in the map.
Then we have a "FIFO queue", really just use a counter for tile numbers. When we add new tiles, we check the count table to see if we can replace that tile or not, and skip over tiles which are on the map.
Adding tiles to the map always knocks off an old tile. Be sure to initialize the count table correctly to what's actually in the tilemap.
When adding tiles to the map the procedure is:
* knock off the old tile
- - decrease its entry in count table
* add new tile
- - Look it up in the lookup table
- - Tile Exists?
- - - Cool, Increase its entry in count table
- - Tile doesn't exist?
- - - Look in count table under cursor
- - - Count is zero?
- - - - Make the lookup for the old tile invalid (you need the back table for this)
- - - - Set the lookup for the new tile
- - - - Increase entry in count table
- - - Count isn't zero?
- - - - Skip that, move the cursor to the next tile, repeat
You can possibly reduce the size and complexity of the lookup tables by using "Metatiles", which are composed of multiple 8x8 tiles. Maybe 16x16, or 32x32. Have them be composed of sequential physical 8x8 tiles in a box shape.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#149189 - JanoSicek - Wed Jan 16, 2008 2:26 pm
Dwedit wrote: |
Here's my suggestion.
So we have "virtual" tiles, which is what is specified, and "physical" tiles that actually sit in VRAM.
Use two tables:
* Lookup table from virtual tiles to physical tiles.
* Back table from physical tiles to virtual tiles
* Count table for each physical tile. This keeps track of how many times a physical tile appears in the map.
Then we have a "FIFO queue", really just use a counter for tile numbers. When we add new tiles, we check the count table to see if we can replace that tile or not, and skip over tiles which are on the map.
Adding tiles to the map always knocks off an old tile. Be sure to initialize the count table correctly to what's actually in the tilemap.
When adding tiles to the map the procedure is:
* knock off the old tile
- - decrease its entry in count table
* add new tile
- - Look it up in the lookup table
- - Tile Exists?
- - - Cool, Increase its entry in count table
- - Tile doesn't exist?
- - - Look in count table under cursor
- - - Count is zero?
- - - - Make the lookup for the old tile invalid (you need the back table for this)
- - - - Set the lookup for the new tile
- - - - Increase entry in count table
- - - Count isn't zero?
- - - - Skip that, move the cursor to the next tile, repeat
|
This is very similar to what I've been doing.
My arrays:
u16 loadedtile[1024] - what tile is loaded in each of 1024 slots?
u8 loadedage[1024] - how long since its last use
u16 hashtable[2048] - this is reverse lookup if we want to know the slot in which a tile 0-65536 is loaded.
At the start of each scroll, I increase loadedage by 1 for all 1024 slots.
Then for the screen 33x25 I check:
a) tile currently in the map is the wanted one = loadedtile[whatswritten]==shouldbeloaded
we only set loadedage[whatswritten] to 0
b) tile in map is incorrect one, and the tile that should be loaded is in some slot = loadedtile[hashtable[hash(shouldbeloaded)]]==shouldbeloaded
we write this slot to the map and set its age to 0
map=slot
loadedage[slot]=0
c) tile in map is incorrect one, and the tile is not loaded anywhere
we write this tile to the buffer of tiles to load
After checking the whole map, if we need to load some tiles, we evict the same number of tiles with highest loadedage[x] and then load the new tiles in these slots.
The hash table is kept up to date at all times of course.
But even this effective algorithm is taking more than 1 whole frame when scrolling three backgrounds at once...
Of course, if you scroll only in small increments, then it works smoothly. However if you move the background by pen, and do a very quick movement, you can move it by 100+ pixels, and then it consumes 2/3 frames to redraw the scene.
Fact: If you do such a big jump, it does not really matter if it is drawn slowly, as the big jump broke the smoothness anyway. However I am afraid that the whole game will slow down and it will be noticable -- if you do quick scrolling left and right, you will slow down all monsters and processes in the game...
#149193 - Dwedit - Wed Jan 16, 2008 3:48 pm
You should only be concerned with the columns or rows which change as a result of scrolling.
The LRU algorithm where you age a bunch of tiles, and the need to iterate over the whole tilemap is really slowing your algorithm down. Iterating over big arrays is slow. If you eliminate LRU, then you don't need to iterate over the tilemap anymore. You can use the tile count system instead to determine which tiles are not replaceable, since the tile count system only looks at when tiles are added and removed, not scanning over an entire map.
I suggest you give my algorithm a try. It doesn't need to age tiles, and doesn't need to iterate over a tilemap. You can also replace the big virtual->physical lookup table with a hash table, if the hash table doesn't have much
A full lookup table for 65536 tiles is 128K large. A 2048 entry hash table would easily be under 16K in the worst case.
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#149194 - JanoSicek - Wed Jan 16, 2008 3:56 pm
Dwedit wrote: |
You should only be concerned with the columns or rows which change as a result of scrolling.
The LRU algorithm where you age a bunch of tiles, and the need to iterate over the whole tilemap is really slowing your algorithm down. Iterating over big arrays is slow. If you eliminate LRU, then you don't need to iterate over the tilemap anymore. You can use the tile count system instead to determine which tiles are not replaceable, since the tile count system only looks at when tiles are added and removed, not scanning over an entire map.
I suggest you give my algorithm a try. It doesn't need to age tiles, and doesn't need to iterate over a tilemap. You can also replace the big virtual->physical lookup table with a hash table, if the hash table doesn't have much
A full lookup table for 65536 tiles is 128K large. A 2048 entry hash table would easily be under 16K in the worst case. |
It looks like I will have to ditch LRU. Then I can only check new rows which scroll onto the screen and old rows which are scrolled away from the screen.
At the moment I have a hash table for lookup. As I need only 1024 tiles, my hash table has size of 2048 to have small number of collisions. Each bucket is 2 bytes wide, so it is 4kB total per screen.
#149197 - PypeBros - Wed Jan 16, 2008 5:26 pm
JanoSicek wrote: |
At the moment I have a hash table for lookup. As I need only 1024 tiles, my hash table has size of 2048 to have small number of collisions. Each bucket is 2 bytes wide, so it is 4kB total per screen. |
<2,cent>
i'm not sure i got why you are bothering about removing the "most aged" tile.
Your screen can display only 768 tiles at a time, right (32x24), and you'll likely to need a couple of extra tiles for 'to appear' tiles in the next frame.
Indeed, just counting references to tiles and putting those tile numbers whose ref. have reach 0 in some sort of "ready-for-reuse" list, as dwedit suggested, would do the trick (imvho).
@dwedit: amazing how much your algorithm looks like page replacement algorithms in MMU code for operating system kernels ^_^
_________________
SEDS: Sprite Edition on DS :: modplayer