gbadev.org forum archive

This is a read-only mirror of the content originally found on forum.gbadev.org (now offline), salvaged from Wayback machine copies. A new forum can be found here.

DS development > Streaming textures a la Orcs & Elves

#173107 - gusano - Mon Mar 22, 2010 10:34 pm

In a developer diary for Orcs & Elves, Katherine Anna Kang mentions that the game reserved 128k of VRAM for streaming in dynamic textures for the current frame and another 128k for the next frame. Could someone please explain how they do this? I've tried loading textures on the fly right before rendering, but I get a bad flicker. Can someone please help me understand how to implement streaming textures?

Additional questions:

-Approximately how much texture data can be streamed per frame?

-Can I have the texture data in ram and then upload to vram using glTexImage2D and gluTexLoadPal? Or is this too slow?

-If I'm noticing flicker, it could mean that I'm erasing and reloading textures while the screen is being drawn (i think this is called vblank)... I'm calling GFX_FLUSH = 0 which should do the same as swiWaitForVBlank()... maybe I'm not understanding what vblank is...

Thanks in advance for any help!

#173108 - sajiimori - Mon Mar 22, 2010 11:27 pm

Clarification: VDraw is when the screen is being drawn, and VBlank is when it's not (and it's safe to upload textures).

I don't know precisely how much you can upload during VBlank, but 128k seems like a good starting point.

Streaming from RAM is normal. I also frequently stream texture animations from ROM to RAM, then RAM to VRAM, all in the same game loop. However, I normally target 30fps, whereas O&E is a silky-smooth 60.

Getting to the point: GFX_FLUSH=0 doesn't actually wait for VBlank. You still need to call swiWaitForVBlank().

#173109 - DiscoStew - Mon Mar 22, 2010 11:29 pm

If I understand the O&E method, it's like double-buffering, but for textures. When a frame is rendered with one bank, the other bank is being updated with the textures needed for the next frame, and they alternate in this manner.

I'm not sure how much data could be written to VRAM in an entire frame (Though that isn't logical to actually try that since you need to process the game engine too), but if I remember correctly, I had been able to write 256KBytes into VRAM during the time the 3D hardware isn't buffering (which is from scanline 144 to 214, about 70 scanlines long, and that is if the buffer isn't lagging behind from intensive rendering).
_________________
DS - It's all about DiscoStew

#173110 - gusano - Mon Mar 22, 2010 11:48 pm

Ok, things are starting to become clearer...

My main loop is as follows...

while(1)
{
UpdateInput();
UpdateGameState();

// these don't actually draw... they queue up "render calls"
DrawLevel();
DrawSprites();

// delete all textures
ResetTextures();

// this goes through all the render calls and determines which textures are needed... it moves them from ROM to VRAM i think...
LoadNecessaryTextures();

// this actually does the glBegin() glVertex() and glEnd() calls
Render();

GFX_FLUSH = 0;
}

my swiWaitForVBlank() call was at the end of the main loop... after GFX_FLUSH = 0... but i read somewhere that GFX_FLUSH = 0 had the same effect... that's why i removed it... but if you say that i need to wait for vblank before uploading textures... then im thinking that swiWaitForVBlank() should go before ResetTextures()... am I right?

ive created my textures using the Nitro3D converter... so i get bin files for textures and palettes which are put into the NDS file... i get a .h file for the texture and another for the palette... so im assuming all texture data is in ROM... then i use the pointers in those .h files to load the textures with glTexImage2D and glTexLoadPal... which means the data has moved from ROM to VRAM... is this the way to go?

#173111 - sajiimori - Tue Mar 23, 2010 12:10 am

Are you targetting 60fps? If so, that keeps things simple, because you don't have very many choices to make. =)

The following loop seems like a good approach to me, to avoid graphical corruption if the framerate drops. Anyone know if it works?
Code:
DoGameStuff();
Render();
ResetTextures();
swiWaitForVBlank();
LoadTextures();
GFX_FLUSH = 0;

That way, if the code from DoGameStuff up to ResetTextures takes longer than 17ms, the screen won't corrupt from doing a page flip too early (which would display polys that use textures that haven't been uploaded yet).

I could be off-track here, because I don't know ndslib at all. At any rate, the #1 requirement is to upload textures during vblank only, and the secondary requirements are: do as much work as possible before vblank (such as rendering), and make sure there isn't corruption if your framerate dips for any reason.

Edit: If my above code doesn't work on the DS (particularly the GFX_FLUSH being in VBlank), another approach would be to upload textures an extra frame ahead of time. Then you can take your sweet time with all your game logic and rendering, and page flip anytime you want without fear of corruption, since the textures you want were ready long ago. Make sense?

#173113 - gusano - Tue Mar 23, 2010 12:21 am

I tried your code, sajimori, but it didn't work... I didn't see any graphics output...

#173114 - gusano - Tue Mar 23, 2010 12:27 am

By "uploading textures an extra frame ahead of time" do you mean this?

Code:


DoGameStuff();
Render();
GFX_FLUSH = 0;
swiWaitForVBlank();
ResetTextures();
LoadNextFrameTextures();

#173115 - DiscoStew - Tue Mar 23, 2010 12:40 am

If you are trying to load textures during the time the 3D gets rendered to the buffer, you'll probably see some problems because when using something like glTexImage2D, it sets all the main banks to a state of LCDC mode, loads the texture, then sets it back to what they were before.
_________________
DS - It's all about DiscoStew

#173116 - gusano - Tue Mar 23, 2010 12:47 am

How can I know if I'm loading textures during the time the 3d gets rendered to the buffer?

How can I know what fps I'm running at?

#173117 - sajiimori - Tue Mar 23, 2010 1:34 am

Heh, I didn't expect my code to work verbatim, gusano. No matter what you do, you have to think it through, and do it in a way that makes sense for your codebase and the hardware.

#173121 - sverx - Tue Mar 23, 2010 12:17 pm

I guess you need to load the textures for the frame after the next frame in 'the other VRAM bank' as soon as the VBlank starts so you'll be loading textures in bank B, for instance, while the 3D engine is rendering using bank A textures, which should have been loaded before, of course...

Never did that personally, it's just I'm saying I would do that this way.

#173122 - wintermute - Tue Mar 23, 2010 12:56 pm

sajiimori wrote:


Getting to the point: GFX_FLUSH=0 doesn't actually wait for VBlank. You still need to call swiWaitForVBlank().


Actually, that's not correct. What that does is tell the hardware to swap the vertex & poly buffers which doesn't take place until the next vblank. Any attempt to write to those buffers before the swap is complete will stall until it is.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#173124 - gusano - Tue Mar 23, 2010 4:28 pm

Thanks wintermute for the explanation on GFX_FLUSH...

sajimori: I made a mistake when I said your code didn't output any graphics... you were right about tailoring to my codebase... turns out I was clearing my "render call" array before rendering, that's why nothing was shown on screen. Unfortunately, I'm still seeing a flicker... on the DS hardware, the problem shows up all the time... on No$GBA it's only noticeable whenever the number of necessary textures changes... so say for example my render requires 4 textures... if I reset and reload those same 4 textures it looks fine on No$GBA... but if the game state changes and now I only need 3 textures, then there's a flicker when it switches from 4 textures to 3... then it seems to be fine while on 3 textures... if the game changes and I require 4 textures again, it flickers again... does glResetTextures() work correctly? According to this post, there may be a bug with that function:

http://forum.gbadev.org/viewtopic.php?t=16502&highlight=glresettextures

I used the devkitpro updater 1.5.0 to install libnds...

EDIT: I have libnds version 1.4.3

#173129 - sajiimori - Tue Mar 23, 2010 6:56 pm

wintermute, I don't understand. Doesn't it queue the buffer swap? I do recall the caveat you mentioned, which is that it can indirectly cause a VBlank wait, if you try to do additional 3D work before the swap occurs.

But in gusano's case, being a simple game loop without a VBlank interrupt handler, isn't swiWaitForVBlank needed, just so he can be sure to correctly time his texture uploads?

Still, I wasn't sure if my sample game loop would work (at 60fps), because GFX_FLUSH is after swiWaitForVBlank in my example. Assuming LoadTextures completes quickly enough, will the GFX_FLUSH apply to the upcoming VDraw, or will it apply in the next VBlank and slow the game to 30fps?

#173131 - wintermute - Tue Mar 23, 2010 7:14 pm

It'll apply in the next vblank and slow the game to 30fps. The queued buffer swap occurs quite early in the vblank period.

I've tried something similar to this in the past but I ended up setting the texture banks manually & double buffering. i.e. use bank a/b for one frame, c/d for the next so there was always one set mapped to LCDC mode & one set mapped to texture.
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#173132 - sajiimori - Tue Mar 23, 2010 7:23 pm

Oh yeah, that's a nice scheme, since my idea doesn't work. I'm so used to being forced to upload textures in VBlank that I never considered the alternative: don't render from some banks of VRAM, and upload to those banks anytime you want.

#173154 - sverx - Wed Mar 24, 2010 12:28 pm

wintermute wrote:
[...] I ended up setting the texture banks manually & double buffering. i.e. use bank a/b for one frame, c/d for the next so there was always one set mapped to LCDC mode & one set mapped to texture.


I think that's exactly:

Quote:
In a developer diary for Orcs & Elves, Katherine Anna Kang mentions that the game reserved 128k of VRAM for streaming in dynamic textures for the current frame and another 128k for the next frame.


(well, using one VRAM bank a time instead of two...)

#173155 - TwentySeven - Wed Mar 24, 2010 1:24 pm

Ive managed to copy about 128kb max during the safe time at "vblank". How are you managing double that?

#173156 - wintermute - Wed Mar 24, 2010 1:54 pm

What sajiimori said :p
_________________
devkitPro - professional toolchains at amateur prices
devkitPro IRC support
Personal Blog

#173158 - sverx - Wed Mar 24, 2010 3:39 pm

TwentySeven wrote:
Ive managed to copy about 128kb max during the safe time at "vblank". How are you managing double that?


You should have a whole frame (1/60th of a second) to load contents in the 'unallocated' VRAM banks...

#173161 - gusano - Wed Mar 24, 2010 4:08 pm

@ TwentySeven:

he's not copying 256k during VBlank, he's doing it during a whole frame... that's because nothing during that frame uses those VRAM banks... so for example:

[Frame 1]
render using banks A & B
copy next frame's textures into banks C & D

[Frame 2]
render using banks C & D
copy next frame's textures into banks A & B

[Frame 3] (same as frame 1)
render using banks A & B
copy next frame's textures into banks C & D

and so on...

EDIT: oops sverx already said it...

#173162 - Miked0801 - Wed Mar 24, 2010 4:48 pm

Double buffering - giving up half your RAM to allow writes at any time. Expensive RAM tradeoff for performance gain.

#173163 - gusano - Wed Mar 24, 2010 4:52 pm

In my case, all my textures exceed the vram texture limit... so I MUST use texture streaming / double buffering... either that OR:

-using the compressed texture format (which I don't understand yet)
-using only 16 color textures and scale them down to fit all of them...
-get rid of some art

#173164 - sverx - Wed Mar 24, 2010 5:16 pm

gusano wrote:
In my case, all my textures exceed the vram texture limit... so I MUST use texture streaming / double buffering


I don't get it... if you need more than 512KB for all your textures, can you do what you want to just using 256KB for texture per frame?

#173165 - gusano - Wed Mar 24, 2010 5:44 pm

I think I phrased it incorrectly... I am doing a platformer game... each level requires more than 512KB of textures... but I only need about 256k per frame because not every part of the level is visible each frame... and since a lot of textures are sprite sheets (animations), I don't need both the RUN and JUMP animations loaded to be rendered on the same frame...

#173168 - elhobbs - Wed Mar 24, 2010 6:48 pm

for cquake I used custom texture code with 8 bit textures. I partitioned the vram into bins like so
VRAM_A+VRAM_B = 32 textures at 128x64
VRAM_C+VRAM_D = 6 textures at 256x128
this leaves a little space in VRAM_D for 64 textures at 32x32

I then kept track of the frame that each slot was used in and the id of texture in the slot. If it was not used in the previous frame then the slot could be reused. if the texture id in the slot did not match then it would need to be reloaded. I did not bother trying to only load during vblank. I only unlocked the bank that needed to have data loaded to it and I made sure that it it was unlocked for the shortest amount of time possible - right before dmacopy and unlocked again right after. this does result in the occaional black line on the screen though, but it is fairly minimal.

#173169 - sajiimori - Wed Mar 24, 2010 6:54 pm

The loss of VRAM can be reduced by having 256k for non-animated textures, and two 128k banks for double buffering.

#173170 - gusano - Wed Mar 24, 2010 7:15 pm

sajiimori wrote:
The loss of VRAM can be reduced by having 256k for non-animated textures, and two 128k banks for double buffering.


Orcs & Elves did something like this too... I think they used 128k but yeah, it's the same idea...

in my platformer, it means keeping the level tiles texture always in the 256k static bank but my characters' sprite sheet in the dynamic 128k bank...

#173174 - Exophase - Wed Mar 24, 2010 11:01 pm

You don't have the entire vblank to upload textures, because 3D rendering starts 48 lines before the first line in order to nominally stay 48 lines ahead of the current scanline, or more precisely in order to fall up to 48 lines behind and still be okay. Because of this you actually only get a sliver of vblank time (23 scanlines) to unchain textures from the 3D engine.

VRAM blocks have a 16-bit interface w/o waitstates, so DMA should transfer 1 16-bit element per bus clock if the source location (usually main memory) can keep up, which it should be more or less able to burst through. A scanline lasts 2130 bus clocks, so you can transfer up to this amount:

2130 * 2 * 23 = 97980 bytes = 95.684KB.

But bear in mind that unless you get the CPU running from non-cached areas then you'll probably end up stalling it at some point quickly into the start of this. If you're double buffering then you can DMA straight off the gamecard into the texture VRAM. I think the real issue here, and why they did things the way they did, isn't the throughput to VRAM but the throughput off the gamecard which can probably barely transfer anything in 23 scanlines.

#173181 - sverx - Thu Mar 25, 2010 8:54 am

Exophase wrote:
You don't have the entire vblank to upload textures, because 3D rendering starts 48 lines before the first line in order to nominally stay 48 lines ahead of the current scanline, or more precisely in order to fall up to 48 lines behind and still be okay. Because of this you actually only get a sliver of vblank time (23 scanlines) to unchain textures from the 3D engine.


This is true if you want to change textures that will be used in the next frame, but he wants to change textures that will be used in the frame after the next one... so actually there's a whole 1/60th of a second...

btw, very interesting calculations, thanks.

#173184 - Exophase - Thu Mar 25, 2010 3:18 pm

sverx wrote:
This is true if you want to change textures that will be used in the next frame, but he wants to change textures that will be used in the frame after the next one... so actually there's a whole 1/60th of a second...

btw, very interesting calculations, thanks.


It doesn't matter when the textures are used. You can't write to VRAM while it's mapped as texture memory. So you still need to map the bank from texture to LCDC when vblank starts, then remap it when 3D rendering starts.

That is, unless you double buffer, like the original post describes. This is the reason why you'd want to, but it comes at a pretty big expense.

#173188 - sajiimori - Thu Mar 25, 2010 10:07 pm

We already double buffer all our animated textures, except on an individual basis, not across whole banks. That is, we load the new texture (during vblank), and queue the old one to be freed after the next GFX_FLUSH, tracking it all with a basic heap allocator.

This is so we can max out 512k, but also upload textures during the intermediate vblank on a 30fps game.

But it's a complicated way to do things, and it causes some fragmentation. This thread is making me reconsider whether double-buffering whole banks would be better for certain sprite-heavy games.

Edit: I guess if I had to pick a single scheme, for an engine that's shared across many genres of games, I'd stick with our complicated approach, because it keeps the VRAM budgets flexible: you can divide up the 512k however you want, between static textures, animated textures, or other things like GBA-style BGs and sprites, and the code Just Works.

#173189 - Exophase - Thu Mar 25, 2010 10:22 pm

sajiimori wrote:
We already double buffer all our animated textures, except on an individual basis, not across whole banks. That is, we load the new texture (during vblank), and queue the old one to be freed after the next GFX_FLUSH, tracking it all with a basic heap allocator.


Since you're still limited to loading the new texture during the scanlines while the 3D engine isn't drawing why bother deferring the free? The 3D engine won't get a chance to draw what you're freeing before the new one updates.

Having heap allocation over your VRAM is one thing, but I don't see the value in keeping the old one around since it increases the VRAM overhead. This means if you're replacing something in VRAM with something else of the same size you can just overwrite it to the same location.

Double buffering is only being performed per bank to get more time to write to VRAM than the 3D engine's blank interval.

#173192 - sajiimori - Fri Mar 26, 2010 2:05 am

As I said, we support uploading textures in the intermediate VBlank on 30fps games. We can't go around blasting textures until the frame is done.

Incidentally, our sprite frames are individually chopped and shrink-wrapped for minimum size, so they're all different sizes, but that's not the real issue.

#173193 - Exophase - Fri Mar 26, 2010 2:25 am

Okay, so it's because you're at 30fps.

Just out of curiosity, why are you at 30fps?

#173194 - sajiimori - Fri Mar 26, 2010 5:25 am

Um, use your imagination?

Sorry, I just don't see the point of that question, coming from someone smart enough to think up something to do with an entire extra vdraw.

#173197 - Exophase - Fri Mar 26, 2010 5:59 am

sajiimori wrote:
Um, use your imagination?

Sorry, I just don't see the point of that question, coming from someone smart enough to think up something to do with an entire extra vdraw.


Similarly I'm not sure I see the point of your response. Just because I could think of something to do doesn't mean it'd be the thing you're doing. You should just humor me ;P

#173199 - sajiimori - Fri Mar 26, 2010 8:49 am

I don't know what you expect me to say, man. We use the time to do more stuff.