#163214 - gamefreaks - Thu Sep 25, 2008 3:04 pm
Hi,
Firstly, sorry for question(s) on first post! I'm afraid I am one of these people who lurks unregistered for a few weeks and then only bothers to register when I need to post.
I seem to have inherited a DS with a card reader so decided to do a project on the DS. Have got my dev env set up (VS2005 + DevKitARM + LibNDS + No$) My combined ARM7/9 project compiles and runs fine on No$ and HW.
I am new to DS Dev, but not to mobile game dev, (I did gfDoom for those of you with a PocketPC)
So far I have written a MaxScript to export my 3d environments to a text file, and I have written a program that converts the model to fixed point, views the model and exports out into a binary format for loading on the DS. Also optionally builds a pruned quad tree for the level. (Large chunks of the level may be out of view.)
I am also working on a DirectDraw style surface-based system for the menus since I am (frivolously!) going to be running an in-game demo on the top screen and using the bottom screen for menus. (Using 384k for Textures, + 1 Ext-Rot 256x256 15bpp BG for the bottom screen)
The problem I have at the minute is I don?t really know what the DS can do in terms of 3d. Looking over the commercial games, it seems it has good potential though.
So:
1) HW Limit is 6144Verts = 2048Tris. Post backface culling and clipping: Can the DS really do 2k textured, lit, transformed and anti-aliased tris at 60FPS? Or is this some theoretical limit?
2) I am currently using immediate mode to draw 3d. (IE: glNormal(), glVertex() x3, etc. I understand that it is faster to compile this data into a display list and cram it down a fifo. But what sort of gains are we looking at like-for-like % wise? (Or is the main gain that you can continue running code while the Dlist is being copied | decoded | drawn.)
3) How much does a texture change cost? (Assuming it is already in video memory) Is it worth sorting meshes by texture first?
4) TCDM & TCIM: I understand that the DS has a few kb?s of faster memory. (presumably closed to the CPU, so less access time?) How do I make data/code use this? How is it most effectively used?
5) I know this can be done with normals in MaxScript, but does anyone know a way of getting 3dsMax to display the number of rendered polygons in the viewport? I know how the polygon counters work, but It would be useful to get a feel for how many polys are actually being drawn post culling & clipping.
Sorry for long post and thinks for reading!
#163216 - Dwedit - Thu Sep 25, 2008 3:34 pm
I think I can answer the thing about ITCM and DTCM.
You declare a function with
void foobar (void) __attribute__ ((section (".itcm")));
Probably wrap that in a #define somewhere.
Also, by default, the system stack is in DTCM. It's easy to exceed the limit if you are not careful. One easy way to put a variable into DTCM is to simply allocate it on the stack. I think global variables might also be allocated in DTCM? (not sure about this)
If you need a bigger stack than 16k, you can modify the CRT0 file to set the stack to the 4MB main memory block, but that is slower ram (but at least it's cached).
_________________
"We are merely sprites that dance at the beck and call of our button pressing overlord."
#163222 - zeruda - Thu Sep 25, 2008 11:13 pm
gamefreaks wrote: |
1) HW Limit is 6144Verts = 2048Tris. Post backface culling and clipping: Can the DS really do 2k textured, lit, transformed and anti-aliased tris at 60FPS? Or is this some theoretical limit? |
No, its very real, if there's more than that they simply don't get drawn.
Quote: |
2) I am currently using immediate mode to draw 3d. (IE: glNormal(), glVertex() x3, etc. I understand that it is faster to compile this data into a display list and cram it down a fifo. But what sort of gains are we looking at like-for-like % wise? (Or is the main gain that you can continue running code while the Dlist is being copied | decoded | drawn.) |
Speed gain I think is possibly measured in hundreds of percent, definitely a large amount at least. That's even before any parallelism.
Quote: |
3) How much does a texture change cost? (Assuming it is already in video memory) Is it worth sorting meshes by texture first? |
I think it's one cycle if it's in a display list.
Quote: |
4) TCDM & TCIM: I understand that the DS has a few kb?s of faster memory. (presumably closed to the CPU, so less access time?) How do I make data/code use this? How is it most effectively used? |
Just put ITCM_CODE before funtion implementation:
ITCM_CODE returnparameter FunctionName()
The stack is located in the DTCM. I think prefix a variable with DTCM_DATA and it will be put there permanently and reduce space available for the stack. I would recommend not doing this. If you want something in the fast memory just declare a local variable and it'll go in the stack. When it goes out of scope it'll clear so you can then put something else in.
When using DMA(like for display lists) the main RAM is locked. You can however continue with anything in the cpu, cache, ITCM and DTCM. As soon as you try to touch main RAM it'll lock until DMA is complete. You can pass local vars by reference if you need the data in them in multiple functions to control how much they are created/initialised and destroyed. Here's an example of usage:
Code: |
// Stuff 2 is created twice, once each time fastfunction is called, stuff1 is created once only and passed by reference
ITCM_CODE FastFunction(MyType &StuffRef) // This function is put in ITCM
{
MyType Stuff2;
....
}
void MainRamFunction() {} // The funtion is located in main ram
void NormalFunction()
{
MyType Stuff1;
InitialiseData(Stuff1);
SendDisplayListToHW(List1);
FastFunction(Stuff1); // Executes in parallel
SendDisplayListToHW(List2);
MainRamFunction(); // This will wait until display list is processed
FastFunction(Stuff1); // Too late
} |
#163233 - sverx - Fri Sep 26, 2008 4:33 pm
zeruda wrote: |
gamefreaks wrote: | 1) HW Limit is 6144Verts = 2048Tris. Post backface culling and clipping: Can the DS really do 2k textured, lit, transformed and anti-aliased tris at 60FPS? Or is this some theoretical limit? |
No, its very real, if there's more than that they simply don't get drawn. |
mmm, I'm quite a newbie about the topic, but I think the question was something like "does the DS really render 2048 triangles in every situation?". The answer is NO. For instance if there are too many triangles that needs to be rendered -say- in a little part of the screen, it can happen that the line buffer get empty while the 3D hardware is calculating one of those very complex scanlines so the 3D will slow down or draw 'incomplete' lines.
Here is where I read it.
#163240 - silent_code - Fri Sep 26, 2008 7:41 pm
There is some sort of limit for overdraw (around 4 times) and the amount triangle fragments to be drawn (or processed?) per scanline. IIrc, both these issues are in fact the same, or with other words: they have a common cause.
Although, that shouldn't be a problem in most cases.
Using immediate mode is likely to stall the graphics command buffer, causing a frame to be dropped: you'll get 30 hz wihout hitting any obvious limits - there's a thread I started about that. But all in all, the hardware is designed (and works that way) to render its peak amount of "fully feature enabled" triangles (everything "on"), its peak amount of fragments and every bit of 2D graphics at 60 hz. So, it is possible to fully use the hardware, if your software is clever enough.
Additionally, many features (math, tests etc.) are asynchronous! :^)
A maximum of 48 rendering lines can get buffered, which are rendered up ahead of the display refresh, at the start of a frame. The more you stress the rendering (e.g. with overdraw, big triangles etc.), the slower it will become and fragments might get dropped at some point, because the display refresh catches up with the rendering engine (the rasterizer).
There's a status register that shows you the amount of the least free rendering line buffers throughout the whole last frame (0..46 - so, at least one is always "used"). It's a general indicator of how "complex" the last frame's rendering has been. The higher the value, the better it is (so you can do more "complex" rendering).
Good luck! :^D
Best regards. :^)
_________________
July 5th 08: "Volumetric Shadow Demo" 1.6.0 (final) source released
June 5th 08: "Zombie NDS" WIP released!
It's all on my page, just click WWW below.
#163313 - gamefreaks - Mon Sep 29, 2008 2:23 pm
Thanks for the replies. They are most useful. (Especially the DTCM/ITCM info)
This weekend I got my 3d model classes, and 2d blitter up and running and so am in a better position to actually do some programming on the DS itself.
Some initial observations:
1)Wow, floating point is really slow! I did my scaled blit in floating point first to get the routine correct. (1 Float Divide per blit, then 1 floating point addition per pixel) This was sped up by about 50x by converting to fixed point. Similar with changing a single glRotate to glRotatei per frame made a noticeable difference.
2) TexCoords are absolute, had me puzzled for a while as to why my normalised uv?s were not working.
3) Had some weird cache issues with alpha blended blits. Worked fine in No$ but had problems with flickering on real HW. Workaround was to call DC_FlushAll at the end of the frame, but need to research exactly why this is happening so I can fix properly.
4) 3d is very fidgety!
#163316 - M3d10n - Mon Sep 29, 2008 3:06 pm
gamefreaks wrote: |
2) TexCoords are absolute, had me puzzled for a while as to why my normalised uv?s were not working. |
If you don't want to reconvert your models when your texture sizes changes, you can generate UVs for a fixed texture size (like 32x32) and use the texture matrix to scale the texture so it looks correct.
#163326 - gamefreaks - Mon Sep 29, 2008 5:32 pm
M3d10n wrote: |
gamefreaks wrote: |
2) TexCoords are absolute, had me puzzled for a while as to why my normalised uv?s were not working. |
If you don't want to reconvert your models when your texture sizes changes, you can generate UVs for a fixed texture size (like 32x32) and use the texture matrix to scale the texture so it looks correct. |
Hmm...I like that! Another problemette I have at the min is TexCoord accuracy. 0-1<< 4.
Am I right in thinking that I can for example with 0-64 << 4 uv's, and a 128*128 texture I can simply use:
Code: |
...
glMatrixMode(GL_TEXTURE);
glLoadIdentity();
GLvector tv = {inttof32(2),inttof32(2),0};
glScalev(&tv);
...
|
#163343 - M3d10n - Tue Sep 30, 2008 3:59 am
Yep, that's it. You might want to toy around until you find a fixed uv scale that suits you: creating your UVs for bigger textures *might* improve sub-texel accuracy (I haven't tested it too much) at the cost of having a lower max number of tiles (example: at 32x32, you can tile up to 64 times).
#163665 - gamefreaks - Wed Oct 08, 2008 10:59 am
Thanks, Moving to 0-8 << 4 u,v format gave me much better texture alignment accuracy.